[ 
https://issues.apache.org/jira/browse/SPARK-27975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16877836#comment-16877836
 ] 

Gabor Somogyi edited comment on SPARK-27975 at 7/3/19 1:48 PM:
---------------------------------------------------------------

After I've moved console sink from v1 to v2 the progress looks like this:
{code:java}
{
  "id" : "b0a8ffaa-900b-4a8b-8769-c2f43b9cf99b",
  "runId" : "6b4d01ed-7d0f-44de-8997-0b31bf2a106e",
  "name" : null,
  "timestamp" : "2019-07-03T12:55:17.104Z",
  "batchId" : 0,
  "numInputRows" : 1,
  "processedRowsPerSecond" : 0.5115089514066496,
  "durationMs" : {
    "addBatch" : 1486,
    "getBatch" : 3,
    "latestOffset" : 0,
    "queryPlanning" : 335,
    "triggerExecution" : 1955,
    "walCommit" : 69
  },
  "stateOperators" : [ ],
  "sources" : [ {
    "description" : "MemoryStream[value#1]",
    "startOffset" : null,
    "endOffset" : 0,
    "numInputRows" : 1,
    "processedRowsPerSecond" : 0.5115089514066496
  } ],
  "sink" : {
    "description" : 
"org.apache.spark.sql.execution.streaming.ConsoleTable@373e6cb2",
    "numOutputRows" : 1
  }
}
{code}
I think TextSocketV2 is different from ConsoleTable. In TextSocketV2 the 
mentioned parameters are keys for the instance which is not the same for 
ConsoleTable. Additionally in DSv2, table and writer concept is split up and 
requires extra references to get information from writer.

I agree at the moment it's not possible to see the parameters of ConsoleWrite 
but this case I suggest to add a log entry (let's consider a sink where maybe 
10+ different params are there for example Kafka).



was (Author: gsomogyi):
After I've moved console sink from v1 to v2 the progress looks like this:
{code:java}
{
  "id" : "b0a8ffaa-900b-4a8b-8769-c2f43b9cf99b",
  "runId" : "6b4d01ed-7d0f-44de-8997-0b31bf2a106e",
  "name" : null,
  "timestamp" : "2019-07-03T12:55:17.104Z",
  "batchId" : 0,
  "numInputRows" : 1,
  "processedRowsPerSecond" : 0.5115089514066496,
  "durationMs" : {
    "addBatch" : 1486,
    "getBatch" : 3,
    "latestOffset" : 0,
    "queryPlanning" : 335,
    "triggerExecution" : 1955,
    "walCommit" : 69
  },
  "stateOperators" : [ ],
  "sources" : [ {
    "description" : "MemoryStream[value#1]",
    "startOffset" : null,
    "endOffset" : 0,
    "numInputRows" : 1,
    "processedRowsPerSecond" : 0.5115089514066496
  } ],
  "sink" : {
    "description" : 
"org.apache.spark.sql.execution.streaming.ConsoleTable@373e6cb2",
    "numOutputRows" : 1
  }
}
{code}
I think TextSocketV2 is different from ConsoleTable. In TextSocketV2 the 
mentioned parameters are keys for the instance which is not the same for 
ConsoleTable.
I agree at the moment it's not possible to see the parameters of ConsoleWrite 
but this case I suggest to add a log entry (let's consider a sink where maybe 
10+ different params are there for example Kafka).


> ConsoleSink should display alias and options for streaming progress
> -------------------------------------------------------------------
>
>                 Key: SPARK-27975
>                 URL: https://issues.apache.org/jira/browse/SPARK-27975
>             Project: Spark
>          Issue Type: Improvement
>          Components: Structured Streaming
>    Affects Versions: 3.0.0
>            Reporter: Jacek Laskowski
>            Priority: Minor
>
> {{console}} sink shows itself in progress with this internal instance 
> representation as follows:
> {code:json}
>   "sink" : {
>     "description" : 
> "org.apache.spark.sql.execution.streaming.ConsoleSinkProvider@12fa674a"
>   }
> {code}
> That is not very user-friendly and would be much better for debugging if it 
> included the alias and options as {{socket}} does:
> {code}
>   "sources" : [ {
>     "description" : "TextSocketV2[host: localhost, port: 8888]",
>     ...
>   } ],
> {code}
> The entire sample progress looks as follows:
> {code}
> 19/06/07 11:47:18 INFO MicroBatchExecution: Streaming query made progress: {
>   "id" : "26bedc9f-076f-4b15-8e17-f09609aaecac",
>   "runId" : "8c365e74-7ac9-4fad-bf1b-397eb086661e",
>   "name" : "socket-console",
>   "timestamp" : "2019-06-07T09:47:18.969Z",
>   "batchId" : 2,
>   "numInputRows" : 0,
>   "inputRowsPerSecond" : 0.0,
>   "durationMs" : {
>     "getEndOffset" : 0,
>     "setOffsetRange" : 0,
>     "triggerExecution" : 0
>   },
>   "stateOperators" : [ ],
>   "sources" : [ {
>     "description" : "TextSocketV2[host: localhost, port: 8888]",
>     "startOffset" : 0,
>     "endOffset" : 0,
>     "numInputRows" : 0,
>     "inputRowsPerSecond" : 0.0
>   } ],
>   "sink" : {
>     "description" : 
> "org.apache.spark.sql.execution.streaming.ConsoleSinkProvider@12fa674a"
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to