[jira] [Comment Edited] (SPARK-20597) KafkaSourceProvider falls back on path as synonym for topic

Satyajit varma (JIRA) Fri, 07 Jul 2017 10:11:16 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-20597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16078395#comment-16078395
 ]


Satyajit varma edited comment on SPARK-20597 at 7/7/17 5:10 PM:
----------------------------------------------------------------

Hi [~jlaskowski],

I am almost done, with the above required change and i would like to confirm 
few things before i submit the PR. (SPARK-20597)

1.In the ticket when you say, "What seems a quite interesting option is to 
support start(path: String) as the least precedence option in which path would 
designate the default topic {color:#f6c342}when no other options are 
used{color}.". Were you referring to only option("topic","topic_name")? or any 
other option like option("checkpointLocation", ...) ?

I would like to check on this with you because, we would end up getting 
"{color:#f6c342}org.apache.spark.sql.AnalysisException: checkpointLocation must 
be specified either through option("checkpointLocation", ...) or 
SparkSession.conf.set("spark.sql.streaming.checkpointLocation", ...);.{color}" 
error, if we try in executing the below line of code.

     df.writeStream.format("kafka").start("topic") because we have not provided 
any checkpointlocation option.


2.PFB code , that i am using to get the above functionality working,
   (This is in KafkaSourceProvider.scala) Line 145
// Picks the defaulttopicname from "path" key, an entry in "parameters" Map,
// if no topic key is present in the "parameters" Map and is provided with key 
"path".
val defaultTopic = parameters.get(TOPIC_OPTION_KEY) match {
  case None => parameters.get(PATH_OPTION_KEY) match {
    case path: Option[String] => parameters.get(PATH_OPTION_KEY).map(_.trim) 
case _ => None}
  case topic: Option[String] => parameters.get(TOPIC_OPTION_KEY).map(_.trim)
}

Let me know, if this looks okay, or if i am missing any more edge cases or 
something that i should be taking care of.
I am trying to be very careful and because i am newbie , i would like the 
experts feedback to my above approach or any other feedback.

if this looks good, i can set the same in createRelation method , Line 
163(KafkaSourceProvider.scala), test it for the topic column option(our other 
scenario to test) and can submit the PR immediately.

Regards,
Satyajit.


was (Author: satyajit):
Hi [~jlaskowski],

I am almost done, with the above required change and i would like to confirm 
few things before i submit the PR. (SPARK-20597)

1.In the ticket when you say, "What seems a quite interesting option is to 
support start(path: String) as the least precedence option in which path would 
designate the default topic when no other options are used.". Were you 
referring to only option("topic","topic_name")? or any other option like 
option("checkpointLocation", ...) ?

I would like to check on this with you because, we would end up getting 
"org.apache.spark.sql.AnalysisException: checkpointLocation must be specified 
either through option("checkpointLocation", ...) or 
SparkSession.conf.set("spark.sql.streaming.checkpointLocation", ...);." error, 
if we try in executing the below line of code.

     df.writeStream.format("kafka").start("topic") because we have not provided 
any checkpointlocation option.


2.PFB code , that i am using to get the above functionality working,
   (This is in KafkaSourceProvider.scala) Line 145
// Picks the defaulttopicname from "path" key, an entry in "parameters" Map,
// if no topic key is present in the "parameters" Map and is provided with key 
"path".
val defaultTopic = parameters.get(TOPIC_OPTION_KEY) match {
  case None => parameters.get(PATH_OPTION_KEY) match {
    case path: Option[String] => parameters.get(PATH_OPTION_KEY).map(_.trim) 
case _ => None}
  case topic: Option[String] => parameters.get(TOPIC_OPTION_KEY).map(_.trim)
}

Let me know, if this looks okay, or if i am missing any more edge cases or 
something that i should be taking care of.
I am trying to be very careful and because i am newbie , i would like the 
experts feedback to my above approach or any other feedback.

if this looks good, i can set the same in createRelation method , Line 
163(KafkaSourceProvider.scala), test it for the topic column option(our other 
scenario to test) and can submit the PR immediately.

Regards,
Satyajit.

> KafkaSourceProvider falls back on path as synonym for topic
> -----------------------------------------------------------
>
>                 Key: SPARK-20597
>                 URL: https://issues.apache.org/jira/browse/SPARK-20597
>             Project: Spark
>          Issue Type: Improvement
>          Components: Structured Streaming
>    Affects Versions: 2.2.0
>            Reporter: Jacek Laskowski
>            Priority: Trivial
>              Labels: starter
>
> # {{KafkaSourceProvider}} supports {{topic}} option that sets the Kafka topic 
> to save a DataFrame's rows to
> # {{KafkaSourceProvider}} can use {{topic}} column to assign rows to Kafka 
> topics for writing
> What seems a quite interesting option is to support {{start(path: String)}} 
> as the least precedence option in which {{path}} would designate the default 
> topic when no other options are used.
> {code}
> df.writeStream.format("kafka").start("topic")
> {code}
> See 
> http://apache-spark-developers-list.1001551.n3.nabble.com/KafkaSourceProvider-Why-topic-option-and-column-without-reverting-to-path-as-the-least-priority-td21458.html
>  for discussion



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-20597) KafkaSourceProvider falls back on path as synonym for topic

Reply via email to