[ https://issues.apache.org/jira/browse/SPARK-24924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568348#comment-16568348 ]
Hyukjin Kwon edited comment on SPARK-24924 at 8/3/18 3:29 PM: -------------------------------------------------------------- {quote} but at the same time we aren't adding the spark.read.avro syntax so it break in that case or they get a different implementation by default? {quote} If users call this, that's still going to use the builtin implemtnation (https://github.com/databricks/spark-avro/blob/branch-4.0/src/main/scala/com/databricks/spark/avro/package.scala#L26) as it's a short name for {{format("com.databricks.spark.avro")}}. {quote} our internal implementation which could very well be different. {quote} It wouldn't be very different for 2.4.0. It could be different but I guess it should be incremental improvement without behaviour changes. {quote} I would rather just plain error out saying these conflict, either update or change your external package to use a different name. {quote} IIRC, in the past, we did for CSV datasource and many users complained about this. {code} java.lang.RuntimeException: Multiple sources found for csv (org.apache.spark.sql.execution.datasources.csv.CSVFileFormat, com.databricks.spark.csv.DefaultSource15), please specify the fully qualified class name. {code} In practice, I am actually a bit more sure on the current approach since users actually complained about this a lot and now I am not seeing (so far) the complaints about the current approach. {quote} There is also the case one might be able to argue its breaking api compatilibity since .avro option went away, buts it a third party library so you can probably get away with that. {quote} It's went away so I guess if the jar is provided with implicit import to support this, this should work as usual and use the internal implementation in theory. If the jar is not given, .avro API is not supported and the internal implmentation will be used. was (Author: hyukjin.kwon): {quote} but at the same time we aren't adding the spark.read.avro syntax so it break in that case or they get a different implementation by default? {quote} If users call this, that's still going to use the builtin implemtnation (https://github.com/databricks/spark-avro/blob/branch-4.0/src/main/scala/com/databricks/spark/avro/package.scala#L26) as it's a short name for {{format("com.databricks.spark.avro")}}. {quote} our internal implementation which could very well be different. {quote} It wouldn't be very different for 2.4.0. It could be different but I guess it should be incremental improvement without behaviour changes. {quote} I would rather just plain error out saying these conflict, either update or change your external package to use a different name. {quote} IIRC, in the past, we did for CSV datasource and many users complained about this. {code} java.lang.RuntimeException: Multiple sources found for csv (org.apache.spark.sql.execution.datasources.csv.CSVFileFormat, com.databricks.spark.csv.DefaultSource15), please specify the fully qualified class name. {code} In practice, I am actually a bit more sure on the current approach since users actually complained about his a lot and now I am not seeing (so far) the complains about the current approach. {code} There is also the case one might be able to argue its breaking api compatilibity since .avro option went away, buts it a third party library so you can probably get away with that. {code} It's went away so I guess if the jar is provided with implicit import to support this, this should work as usual and use the internal implementation in theory. If the jar is not given, .avro API is not supported and the internal implmentation will be used. > Add mapping for built-in Avro data source > ----------------------------------------- > > Key: SPARK-24924 > URL: https://issues.apache.org/jira/browse/SPARK-24924 > Project: Spark > Issue Type: Sub-task > Components: SQL > Affects Versions: 2.4.0 > Reporter: Dongjoon Hyun > Assignee: Dongjoon Hyun > Priority: Minor > Fix For: 2.4.0 > > > This issue aims to the followings. > # Like `com.databricks.spark.csv` mapping, we had better map > `com.databricks.spark.avro` to built-in Avro data source. > # Remove incorrect error message, `Please find an Avro package at ...`. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org