Hello,

On my test-box, I have both "webhdfs" (running on port# 50070) & "httpfs" 
(running on port# 14000) configured. 

Using a sample test-program:

1) I am able to write & read the parquet-data-files successfully via the 
URI: "webhdfs://isodvm122.in.ibm.com:50070/tmp/ravi/pqf1" (webhdfs).
2) And I am able to write the parquet-data-files successfully using the 
URI: "webhdfs://isodvm122.in.ibm.com:14000/tmp/ravi/pqf1" (httpfs). 
3) However, when I try to read the same parquet-file (that could be read 
successfully via webhdfs-port# 50070) using the URI: 
"webhdfs://isodvm122.in.ibm.com:14000/tmp/ravi/pqf1" (httpfs), I am 
getting the following error.

==============================
....................
....................
Caused by: 
org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File 
does not exist: /tmp/ravi/pqf1/pqf1
        at 
org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:112)
        at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:358)
        at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$200(WebHdfsFileSystem.java:90)
        at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.connect(WebHdfsFileSystem.java:534)
        at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:610)
==============================

Basically, I am trying to understand, why it is taking the file-path as: "
/tmp/ravi/pqf1/pqf1", when the file-path specified via the URI is: "
/tmp/ravi/pqf1" ?  Is it due to any configuration in "httpfs" (or) any bug 
/ issue with "ParquetReader-API" ? Could you please let me know.

NOTE:
Also, I am able to read/write other file-formats ("orc" or "delimited") 
normally via using "httpfs-URI". And I am able to read the 
parquet-data-file: "pqf1" successfully using the URI: 
"webhdfs://isodvm122.in.ibm.com:14000/tmp/ravi" (httpfs), when I keep a 
single file: "pqf1" in the folder: "/tmp/ravi". FYI.

Thanks,
 Ravi

Reply via email to