[
https://issues.apache.org/jira/browse/SPARK-24924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16569071#comment-16569071
]
Hyukjin Kwon commented on SPARK-24924:
--------------------------------------
If it already throws an error for CSV case too, I would prefer to have the
improved error message of course.
{quote}
I don't buy this agrument, the code has been restructured a lot and you could
have introduced bugs, behavior changes, etc.
{quote}
I have followed the changes in Avro and I don't think there are big
differences. We should keep the behaviours in particular within 2.4.0. If I
missed some and this introduced a bug or behaviour changes, I personally think
we should fix them within 2.4.0. That was one of key things I took into account
when I merged some changes.
{quote}
Users could have also made their own modified version of the databricks
spark-avro package (which we actually have to support primitive types) and thus
the implementation is not the same and yet you are assuming it is.
{quote}
In this case, users should provide their own short name of the package. I would
say it's discouraged to use the same name with Spark's builtin datasources, or
other packages name reserved - I wonder if users would actually try to have the
same name in practice.
{quote}
I'm worried about other users who didn't happen to see this jira.
{quote}
We will make this in the release note - I think I listed up the possible
stories about this in
https://issues.apache.org/jira/browse/SPARK-24924?focusedCommentId=16567708&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16567708
{quote}
I also realize these are 3rd party packages but I think we are making the
assumption here based on this being a databricks package, which in my opinion
we shouldn't. What if this was companyX package which we didn't know about,
what would/should be the expected behavior?
{quote}
I think the main reason for this is that the code is actually ported from Avro
{{com.databricks.\*}}. The problem here is a worry that {{com.databricks.*}}
indicates the builtin Avro, right?
{quote}
How many users complained about the csv thing?
{quote}
So far, I see some issues as below:
https://github.com/databricks/spark-csv/issues/367
https://github.com/databricks/spark-csv/issues/373
https://github.com/apache/spark/pull/17916#issuecomment-301898567
For clarification, it's related to me in any way but I thought we better keep
it consistent with CSV's.
To sum up, I get your position but I think the current approach makes a
coherent point too. In that case, I think we better follow what we have done
with CSV.
> Add mapping for built-in Avro data source
> -----------------------------------------
>
> Key: SPARK-24924
> URL: https://issues.apache.org/jira/browse/SPARK-24924
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
> Affects Versions: 2.4.0
> Reporter: Dongjoon Hyun
> Assignee: Dongjoon Hyun
> Priority: Minor
> Fix For: 2.4.0
>
>
> This issue aims to the followings.
> # Like `com.databricks.spark.csv` mapping, we had better map
> `com.databricks.spark.avro` to built-in Avro data source.
> # Remove incorrect error message, `Please find an Avro package at ...`.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]