jinchengchenghh commented on PR #5447:
URL:
https://github.com/apache/incubator-gluten/pull/5447#issuecomment-2099651706
After I enable HDFS in ARROW, I can successfully read csv file in HDFS.
```
scala> val filePath = "/input/student.csv"
filePath: String = /input/student.csv
scala> val df = spark.read.format("csv").option("header",
"true").load(filePath)
E0508 10:25:42.841267 3005137 Exceptions.h:69] Line:
/mnt/DP_disk1/code/incubator-gluten/ep/build-velox/build/velox_ep/velox/exec/Task.cpp:1850,
Function:terminate, Expression: Cancelled, Source: RUNTIME, ErrorCode:
INVALID_STATE
df: org.apache.spark.sql.DataFrame = [Name: string, Language: string]
scala>
| df.show()
+-----+--------+
| Name|Language|
+-----+--------+
| Juno| Java|
|Peter| Python|
|Celin| C++|
+-----+--------+
scala> print(df.queryExecution.executedPlan)
*(1) ColumnarToRow
+- ArrowFileScan arrowcsv [Name#17,Language#18] Batched: true, DataFilters:
[], Format: org.apache.gluten.datasource.ArrowCSVFileFormat@485f3327, Location:
InMemoryFileIndex(1 paths)[hdfs://0.0.0.0:9000/input/student.csv],
PartitionFilters: [], PushedFilters: [], ReadSchema:
struct<Name:string,Language:string>
```
My local protobuf version, I'm not sure if it is related to protobuf version
@liujiayi771
```
root@sr249:/mnt/DP_disk2/tpcds/scripts# protoc --version
libprotoc 3.21.4
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]