[ https://issues.apache.org/jira/browse/SPARK-45879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Joseph Evans updated SPARK-45879: ---------------------------------------- Affects Version/s: 3.4.1 3.2.3 > Number check for InputFileBlockSources is missing for V2 source (BatchScan) ? > ----------------------------------------------------------------------------- > > Key: SPARK-45879 > URL: https://issues.apache.org/jira/browse/SPARK-45879 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.2.3, 3.4.1, 3.5.0 > Environment: I tried on Spark 323 and Spark 341, and both can > reproduce this issue. > Reporter: Liangcai li > Priority: Major > > When doing a join with the "input_file_name()" function, it will blow up with > a > *AnalysisException* if using the v1 data source (FileSourceScan). That's ok. > > But if we change to use the v2 data source (BatchScan), the expected > exception is gone, and the join passes. > Is this number check for InputFileDataSources mssing for V2 data source ? or > is it by design ? > > Repro steps: > {code:java} > scala> spark.range(100).withColumn("const1", > lit("from_t1")).write.parquet("/data/tmp/t1") > > scala> spark.range(100).withColumn("const2", > lit("from_t2")).write.parquet("/data/tmp/t2") > > scala> spark.conf.set("spark.sql.sources.useV1SourceList", "parquet") > > scala> > spark.read.parquet("/data/tmp/t1").join(spark.read.parquet("/data/tmp/t2"), > "id", "inner").selectExpr("*", "input_file_name()").show(5, false) > org.apache.spark.sql.AnalysisException: 'input_file_name' does not support > more than one sources.; line 1 pos 0; > Project id#376L, const1#377, const2#381, input_file_name() AS > input_file_name()#389 > +- Project id#376L, const1#377, const2#381 > +- Join Inner, (id#376L = id#380L) > :- Relation id#376L,const1#377 parquet > +- Relation id#380L,const2#381 parquet > > at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:52) > at > org.apache.spark.sql.execution.datasources.PreReadCheck$.org$apache$spark$sql$execution$datasources$PreReadCheck$$checkNumInputFileBlockSources(rules.scala:476) > at > org.apache.spark.sql.execution.datasources.PreReadCheck$.$anonfun$checkNumInputFileBlockSources$2(rules.scala:472) > at > org.apache.spark.sql.execution.datasources.PreReadCheck$.$anonfun$checkNumInputFileBlockSources$2$adapted(rules.scala:472) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) > at scala.collection.Iterator.foreach(Iterator.scala:943) > at scala.collection.Iterator.foreach$(Iterator.scala:943) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) > at scala.collection.IterableLike.foreach(IterableLike.scala:74) > at scala.collection.IterableLike.foreach$(IterableLike.scala:73) > scala> spark.conf.set("spark.sql.sources.useV1SourceList", "") > > scala> > spark.read.parquet("/data/tmp/t1").join(spark.read.parquet("/data/tmp/t2"), > "id", "inner").selectExpr("*", "input_file_name()").show(5, false) > +---+-------+-------+---------------------------------------------------------------------------------------+ > |id |const1 |const2 |input_file_name() > | > +---+-------+-------+---------------------------------------------------------------------------------------+ > |91 > |from_t1|from_t2|file:///data/tmp/t1/part-00011-a52b9990-4463-447c-9cdf-7a84542de2f7-c000.snappy.parquet| > |92 > |from_t1|from_t2|file:///data/tmp/t1/part-00011-a52b9990-4463-447c-9cdf-7a84542de2f7-c000.snappy.parquet| > |93 > |from_t1|from_t2|file:///data/tmp/t1/part-00011-a52b9990-4463-447c-9cdf-7a84542de2f7-c000.snappy.parquet| > |94 > |from_t1|from_t2|file:///data/tmp/t1/part-00011-a52b9990-4463-447c-9cdf-7a84542de2f7-c000.snappy.parquet| > |95 > |from_t1|from_t2|file:///data/tmp/t1/part-00011-a52b9990-4463-447c-9cdf-7a84542de2f7-c000.snappy.parquet| > +---+-------+-------+---------------------------------------------------------------------------------------+ > only showing top 5 rows{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org