beliefer opened a new issue, #11062:
URL: https://github.com/apache/incubator-gluten/issues/11062
### Backend
VL (Velox)
### Bug description
We have some Hive table with ORC format. Users want convert the Hive table
from ORC format to Parquet format. In the transition stage, some partitions
are already converted to Parquet format, but the others not.
We execute the SQL, such as:
select * from intermediate_state_table;
It is OK when we using Spark, but it is bad when using Gluten. We get the
following error.
```
Caused by: org.apache.gluten.exception.GlutenException: Exception:
VeloxRuntimeError
Error Source: RUNTIME
Error Code: INVALID_STATE
Reason: No magic bytes found at end of the Parquet file
Retriable: False
Expression: strncmp(copy.data() + readSize - 4, "PAR1", 4) == 0
Context: Split [Hive:
hdfs://path/intermediate_state_table/20240818/part-00000-cc17cd1a-35a6-4ec6-8694-3dcdcb95ccfc-c000
0 - 2545077] Task Gluten_Stage_0_TID_160_VTID_1
Additional Context: Operator: TableScan[0] 0
Function: loadFileMetaData
File:
/home/hadoop/gluten/ep/build-velox/build/velox_ep/velox/dwio/parquet/reader/ParquetReader.cpp
Line: 216
Stack trace:
# 0 _ZN8facebook5velox7process10StackTraceC1Ei
# 1
_ZN8facebook5velox14VeloxExceptionC1EPKcmS3_St17basic_string_viewIcSt11char_traitsIcEES7_S7_S7_bNS1_4TypeES7_
# 2
_ZN8facebook5velox6detail14veloxCheckFailINS0_17VeloxRuntimeErrorEPKcEEvRKNS1_18VeloxCheckFailArgsET0_
# 3 _ZN8facebook5velox7parquet10ReaderBase16loadFileMetaDataEv
# 4
_ZN8facebook5velox7parquet10ReaderBaseC1ESt10unique_ptrINS0_4dwio6common13BufferedInputESt14default_deleteIS6_EERKNS5_13ReaderOptionsE
# 5
_ZN8facebook5velox7parquet13ParquetReaderC2ESt10unique_ptrINS0_4dwio6common13BufferedInputESt14default_deleteIS6_EERKNS5_13ReaderOptionsE
# 6
_ZN8facebook5velox7parquet20ParquetReaderFactory12createReaderESt10unique_ptrINS0_4dwio6common13BufferedInputESt14default_deleteIS6_EERKNS5_13ReaderOptionsE
# 7 _ZN8facebook5velox9connector4hive11SplitReader12createReaderEv
# 8
_ZN8facebook5velox9connector4hive11SplitReader12prepareSplitESt10shared_ptrINS0_6common14MetadataFilterEERNS0_4dwio6common17RuntimeStatisticsE
# 9
_ZN8facebook5velox9connector4hive14HiveDataSource8addSplitESt10shared_ptrINS1_14ConnectorSplitEE
# 10 _ZN8facebook5velox4exec9TableScan9getOutputEv
# 11
_ZZN8facebook5velox4exec6Driver11runInternalERSt10shared_ptrIS2_ERS3_INS1_13BlockingStateEERS3_INS0_9RowVectorEEENKUlvE8_clEv
# 12
_ZN8facebook5velox4exec6Driver11runInternalERSt10shared_ptrIS2_ERS3_INS1_13BlockingStateEERS3_INS0_9RowVectorEE
# 13 _ZN8facebook5velox4exec6Driver4nextEPN5folly10SemiFutureINS3_4UnitEEE
# 14 _ZN8facebook5velox4exec4Task4nextEPN5folly10SemiFutureINS3_4UnitEEE
# 15 _ZN6gluten24WholeStageResultIterator4nextEv
# 16 Java_org_apache_gluten_vectorized_ColumnarBatchOutIterator_nativeHasNext
# 17 0x00007f48b5018747
at
org.apache.gluten.vectorized.ColumnarBatchOutIterator.nativeHasNext(Native
Method)
at
org.apache.gluten.vectorized.ColumnarBatchOutIterator.hasNext0(ColumnarBatchOutIterator.java:57)
at
org.apache.gluten.iterator.ClosableIterator.hasNext(ClosableIterator.java:39)
... 27 more
```
### Gluten version
_No response_
### Spark version
None
### Spark configurations
_No response_
### System information
_No response_
### Relevant logs
```bash
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]