[GitHub] [incubator-seatunnel] chenhu commented on a diff in pull request #1876: [Bug-fix][seatunnel-connector-spark-tidb]The dependency "tispark-assembly" should cause "Multiple sources found for parquet xxxxx"

GitBox Mon, 16 May 2022 20:09:02 -0700


chenhu commented on code in PR #1876:
URL: 
https://github.com/apache/incubator-seatunnel/pull/1876#discussion_r874320466



##########
seatunnel-connectors/seatunnel-connectors-spark/seatunnel-connector-spark-tidb/pom.xml:
##########
@@ -60,6 +60,10 @@
                     <groupId>mysql</groupId>
                     <artifactId>mysql-connector-java</artifactId>
                 </exclusion>
+                <exclusion>
+                    <groupId>org.apache.spark</groupId>
+                    <artifactId>spark-sql_${scala.binary.version}}</artifactId>
+                </exclusion>

Review Comment:
   > In my knowledge, whether you exclude the `spark-sql_ ` here, will not 
affect the version of `spark-sql_ ` in tidb.
   
   When the indirect dependency of spark-sql's version in  tispark-assembly , 
is different from the root pom of spark-sql,
   when read parquet data in some plugin, eg: hudi source, will throw the 
exception below:
   ----------------
    Caused by: org.apache.spark.sql.AnalysisException: Multiple sources found 
for parquet 
(org.apache.spark.sql.execution.datasources.v2.parquet.ParquetDataSourceV2, 
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat), please 
specify the fully qualified class name.
   -----------------
   
   this issue is cause by the tispark-assembly's spark-sql has the 
ParquetDataSourceV2 and the root of spark-sql has the ParquetFileFormat, where 
read parquet datasource ,spark can not have a decision  using which Parquet 
format
   
   This occurs in my case .
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [incubator-seatunnel] chenhu commented on a diff in pull request #1876: [Bug-fix][seatunnel-connector-spark-tidb]The dependency "tispark-assembly" should cause "Multiple sources found for parquet xxxxx"

Reply via email to