tooptoop4 edited a comment on issue #1730:
URL: https://github.com/apache/hudi/issues/1730#issuecomment-654797100
prestosql 336 with hudi 0.5.3 gives better error:
```
io.prestosql.spi.PrestoException: Index 2 out of bounds for length 1
at
io.prestosql.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:234)
at
io.prestosql.plugin.hive.util.ResumableTasks$1.run(ResumableTasks.java:38)
at
io.prestosql.$gen.Presto_1c5b75e_dirty____20200705_204556_2.run(Unknown Source)
at
io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:80)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds
for length 1
at org.apache.hudi.common.util.FSUtils.getCommitTime(FSUtils.java:137)
at
org.apache.hudi.common.model.HoodieBaseFile.getCommitTime(HoodieBaseFile.java:55)
at
org.apache.hudi.common.model.HoodieFileGroup.addBaseFile(HoodieFileGroup.java:86)
at java.base/java.util.ArrayList.forEach(Unknown Source)
at
org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$buildFileGroups$4(AbstractTableFileSystemView.java:161)
at java.base/java.lang.Iterable.forEach(Unknown Source)
at
org.apache.hudi.common.table.view.AbstractTableFileSystemView.buildFileGroups(AbstractTableFileSystemView.java:157)
at
org.apache.hudi.common.table.view.AbstractTableFileSystemView.buildFileGroups(AbstractTableFileSystemView.java:135)
at
org.apache.hudi.common.table.view.AbstractTableFileSystemView.addFilesToView(AbstractTableFileSystemView.java:115)
at
org.apache.hudi.common.table.view.HoodieTableFileSystemView.<init>(HoodieTableFileSystemView.java:120)
at
org.apache.hudi.hadoop.HoodieParquetInputFormat.filterFileStatusForSnapshotMode(HoodieParquetInputFormat.java:239)
at
org.apache.hudi.hadoop.HoodieParquetInputFormat.listStatus(HoodieParquetInputFormat.java:110)
at
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:325)
at
io.prestosql.plugin.hive.BackgroundHiveSplitLoader.loadPartition(BackgroundHiveSplitLoader.java:428)
at
io.prestosql.plugin.hive.BackgroundHiveSplitLoader.loadSplits(BackgroundHiveSplitLoader.java:298)
at
io.prestosql.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:227)
... 6 more
```
after putting a log statement for fullFileName i see the value is
part-00007-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet, while for
a table that can be queried fullFileName is
4b37466c-8b75-458e-ba28-1e0f4c350dbe_0_20200324151845.parquet
s3 listing under partition folder of table that works (there is .hoodie/
folder under base table path):
2020-03-24 15:18:55 93 .hoodie_partition_metadata
2020-03-24 15:18:57 2194374
4b37466c-8b75-458e-ba28-1e0f4c350dbe_0_20200324151845.parquet
s3 listing under partition folder of table that gets the error (there is
.hoodie/ folder under base table path):
2020-03-24 15:18:44 0 _SUCCESS
2020-03-24 15:18:37 10649992
part-00000-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet
2020-03-24 15:18:38 8787785
part-00001-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet
2020-03-24 15:18:39 9562198
part-00002-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet
2020-03-24 15:18:40 9359329
part-00003-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet
2020-03-24 15:18:41 10519118
part-00004-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet
2020-03-24 15:18:42 10452807
part-00005-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet
2020-03-24 15:18:42 9104366
part-00006-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet
2020-03-24 15:18:43 9016423
part-00007-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet
**UPDATE**
This is really old table, and got corrupted along the way. After removing
.hoodie/ folder select works ok
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]