[jira] [Updated] (HIVE-11949) "LOCAL" in LOAD DATA LOCAL INPATH means "remote"

Antonio Piccolboni (JIRA) Tue, 06 Oct 2015 09:06:40 -0700

     [ 
https://issues.apache.org/jira/browse/HIVE-11949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Antonio Piccolboni updated HIVE-11949:
--------------------------------------
    Description: 
originally filed as SPARK-10804 -- checked it affects hive in HDP2 as well.
Connecting with a remote thriftserver with a custom JDBC client or beeline, 
load data local inpath fails. Hiveserver2 docs 
([https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients], JDBC 
client sample code) explain in a quick comment that local now means local to 
the server. I think this is just a rationalization for a bug. When a user types 
"local" 

# it needs to be local to him, not some server 
# Failing 1., one needs to have a way to determine what local means and create 
a "local" item under the new definition. 

With the thirftserver, I have a host to connect to, but I don't have any way to 
create a file local to that host, at least in spark. It may not be desirable to 
create user directories on the thriftserver host or running file transfer 
services like scp. Moreover, it appears that this syntax is unique to Hive but 
its origin can be traced to  LOAD DATA LOCAL INFILE in Oracle and was adopted 
by mysql. In the latter docs we can read "If LOCAL is specified, the file is 
read by the client program on the client host and sent to the server. The file 
can be given as a full path name to specify its exact location. If given as a 
relative path name, the name is interpreted relative to the directory in which 
the client program was started". This is not to say that the hive team is bound 
to what Oracle and Mysql do, but to support the idea that the meaning of LOCAL 
is settled. For instance, the Impala documentation says: "Currently, the Impala 
LOAD DATA statement only imports files from HDFS, not from the local 
filesystem. It does not support the LOCAL keyword of the Hive LOAD DATA 
statement." I think this is a better solution, if true client locality can not 
be implemented. The way things are in thriftserver, I developed a client under 
the assumption that I could use LOAD DATA LOCAL INPATH and all tests where 
passing in standalone mode, only to find with the first distributed test that 
it did not generalize beyond that.


  was:
originally filed as SPARK-10804 -- checked it affects hive in HDP2 as well.
Connecting with a remote thriftserver with a custom JDBC client or beeline, 
load data local inpath fails. Hiveserver2 docs 
([https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients], JDBC 
client sample code) explain in a quick comment that local now means local to 
the server. I think this is just a rationalization for a bug. When a user types 
"local" 

# it needs to be local to him, not some server 
# Failing 1., one needs to have a way to determine what local means and create 
a "local" item under the new definition. 

With the thirftserver, I have a host to connect to, but I don't have any way to 
create a file local to that host, at least in spark. It may not be desirable to 
create user directories on the thriftserver host or running file transfer 
services like scp. Moreover, it appears that this syntax is unique to Hive but 
its origin can be traced to  LOAD DATA LOCAL INFILE in Oracle and was adopted 
by mysql. In the latter docs we can read "If LOCAL is specified, the file is 
read by the client program on the client host and sent to the server. The file 
can be given as a full path name to specify its exact location. If given as a 
relative path name, the name is interpreted relative to the directory in which 
the client program was started". This is not to say that the hive team is bound 
to what Oracle and Mysql do, but to support the idea that the meaning of LOCAL 
is settled. For instance, the Impala documentation says: "Currently, the Impala 
LOAD DATA statement only imports files from HDFS, not from the local 
filesystem. It does not support the LOCAL keyword of the Hive LOAD DATA 
statement." I think this is a better solution, if true client locality can not 
be implemented. The way things are in thriftserver, I developed a client under 
the assumption that I could use LOAD DATA LOCAL INPATH and all tests where 
passing in standalone mode, only to find with the first distributed test that 



> "LOCAL" in LOAD DATA LOCAL INPATH means "remote"
> ------------------------------------------------
>
>                 Key: HIVE-11949
>                 URL: https://issues.apache.org/jira/browse/HIVE-11949
>             Project: Hive
>          Issue Type: Bug
>          Components: HiveServer2
>    Affects Versions: 1.2.1
>            Reporter: Antonio Piccolboni
>
> originally filed as SPARK-10804 -- checked it affects hive in HDP2 as well.
> Connecting with a remote thriftserver with a custom JDBC client or beeline, 
> load data local inpath fails. Hiveserver2 docs 
> ([https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients], JDBC 
> client sample code) explain in a quick comment that local now means local to 
> the server. I think this is just a rationalization for a bug. When a user 
> types "local" 
> # it needs to be local to him, not some server 
> # Failing 1., one needs to have a way to determine what local means and 
> create a "local" item under the new definition. 
> With the thirftserver, I have a host to connect to, but I don't have any way 
> to create a file local to that host, at least in spark. It may not be 
> desirable to create user directories on the thriftserver host or running file 
> transfer services like scp. Moreover, it appears that this syntax is unique 
> to Hive but its origin can be traced to  LOAD DATA LOCAL INFILE in Oracle and 
> was adopted by mysql. In the latter docs we can read "If LOCAL is specified, 
> the file is read by the client program on the client host and sent to the 
> server. The file can be given as a full path name to specify its exact 
> location. If given as a relative path name, the name is interpreted relative 
> to the directory in which the client program was started". This is not to say 
> that the hive team is bound to what Oracle and Mysql do, but to support the 
> idea that the meaning of LOCAL is settled. For instance, the Impala 
> documentation says: "Currently, the Impala LOAD DATA statement only imports 
> files from HDFS, not from the local filesystem. It does not support the LOCAL 
> keyword of the Hive LOAD DATA statement." I think this is a better solution, 
> if true client locality can not be implemented. The way things are in 
> thriftserver, I developed a client under the assumption that I could use LOAD 
> DATA LOCAL INPATH and all tests where passing in standalone mode, only to 
> find with the first distributed test that it did not generalize beyond that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11949) "LOCAL" in LOAD DATA LOCAL INPATH means "remote"

Reply via email to