Hello,
On my test-box, I have both "webhdfs" (running on port# 50070) & "httpfs"
(running on port# 14000) configured.
Using a sample test-program:
1) I am able to write & read the parquet-data-files successfully via the
URI: "webhdfs://isodvm122.in.ibm.com:50070/tmp/ravi/pqf1" (webhdfs).
2) And I am able to write the parquet-data-files successfully using the
URI: "webhdfs://isodvm122.in.ibm.com:14000/tmp/ravi/pqf1" (httpfs).
3) However, when I try to read the same parquet-file (that could be read
successfully via webhdfs-port# 50070) using the URI:
"webhdfs://isodvm122.in.ibm.com:14000/tmp/ravi/pqf1" (httpfs), I am
getting the following error.
==============================
....................
....................
Caused by:
org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File
does not exist: /tmp/ravi/pqf1/pqf1
at
org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:112)
at
org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:358)
at
org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$200(WebHdfsFileSystem.java:90)
at
org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.connect(WebHdfsFileSystem.java:534)
at
org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:610)
==============================
Basically, I am trying to understand, why it is taking the file-path as: "
/tmp/ravi/pqf1/pqf1", when the file-path specified via the URI is: "
/tmp/ravi/pqf1" ? Is it due to any configuration in "httpfs" (or) any bug
/ issue with "ParquetReader-API" ? Could you please let me know.
NOTE:
Also, I am able to read/write other file-formats ("orc" or "delimited")
normally via using "httpfs-URI". And I am able to read the
parquet-data-file: "pqf1" successfully using the URI:
"webhdfs://isodvm122.in.ibm.com:14000/tmp/ravi" (httpfs), when I keep a
single file: "pqf1" in the folder: "/tmp/ravi". FYI.
Thanks,
Ravi