zhangxffff commented on issue #8524: URL: https://github.com/apache/arrow-datafusion/issues/8524#issuecomment-1857250251
> I think the reason this happens is that the `ListingTable` does `ObjectStore::list` which finds files in all subdirectories > > > I wonder is this behavior by design or a bug. > > As I understand it, DataFusion is trying to model the behavior of "Hive PartitionedTables" -- so to answer this question I think we need to research what Hive does in this case I tried with hive external table stores as parquet, it seems that hive external table also do not scan parquet file in subdirectory  as show in this picture, when location is `hdfs:///user/hive/warehouse/zxf_test/`, there is no data in external table, when localtion is `hdfs:///user/hive/warehouse/zxf_test/subdir`, external table has two records from two parquet file.  I also tried partitioned external table.  After create table, there is no data.  After specify location of partition `pt1`, we can get data from `hdfs:///user/hive/warehouse/zxf_test_pt/pt1`  After also specfy location of partition `pt2`, we can get data from both `hdfs:///user/hive/warehouse/zxf_test_pt/pt1` and `hdfs:///user/hive/warehouse/zxf_test_pt/pt2`  If I copy a subdirectory with parquet file into hive partition directory, hive report a `java.io.IOException:java.io.IOException: Not a file`   So it seems that hive also do not scan parquet file in the subdirectoy. for hive partitioned table, user should specify the directory of each partition, and there should not contains any subdirectory. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
