Inserting into a table from a map/reduce transform results in no data
---------------------------------------------------------------------

                 Key: HIVE-233
                 URL: https://issues.apache.org/jira/browse/HIVE-233
             Project: Hadoop Hive
          Issue Type: Bug
          Components: Query Processor
            Reporter: Josh Ferguson


When attempting to run a query of the form 

INSERT OVERWRITE TABLE table_name PARTITION ( ... ) SELECT TRANSFORM ... 

The table 'table_name' ends up empty even when the SELECT statement returns 
valid results that can be loaded by hand.

*My target table*

CREATE TABLE percentiles
(actor_id STRING, percentile INT, count INT) 
PARTITIONED BY (account STRING, application STRING, dataset STRING, hour INT, 
span INT) 
CLUSTERED BY (actor_id) INTO 32 BUCKETS 
ROW FORMAT DELIMITED 
COLLECTION ITEMS TERMINATED BY '44'
MAP KEYS TERMINATED BY '58'
STORED AS TEXTFILE;

*The attempted query*

INSERT OVERWRITE TABLE percentiles PARTITION ( 
account='cUU5T7y6DmdzMJFcFt3JDe', application='test', dataset='test', 
hour=341976, span=168 ) SELECT TRANSFORM(actor_id) USING 
'/Users/Josh/cluster/bin/percentiles.rb' AS (actor_id, percentile, count) FROM 
( SELECT actor_id FROM activities CLUSTER BY actor_id ) actors;

*The result in hadoop (which is 0 bytes)*

$ hadoop fs -ls /user/hive/warehouse/percentiles/*/*/*/*/*
-rw-r--r--   1 Josh supergroup          0 2009-01-14 11:19 
/user/hive/warehouse/percentiles/account=cUU5T7y6DmdzMJFcFt3JDe/application=test/dataset=test/hour=341976/span=168/attempt_200901131908_0009_r_000000_0

*Inserting into a temporary directory first*

hive> INSERT OVERWRITE DIRECTORY 'hdfs://localhost:9000/tmp/hdfs_out' SELECT 
TRANSFORM(actor_id) USING '/Users/Josh/cluster/bin/percentiles.rb' AS 
(actor_id, percentile, count) FROM ( SELECT actor_id FROM activities CLUSTER BY 
actor_id ) actors;

*The results in hadoop (8600 bytes)*

$ hadoop fs -ls /tmp/hdfs_out
Found 1 items
-rw-r--r--   1 Josh supergroup       8600 2009-01-14 11:27 
/tmp/hdfs_out/attempt_200901131908_0010_r_000000_0

*Loading from the temporary directory into percentiles*

hive> LOAD DATA INPATH 'hdfs://localhost:9000/tmp/hdfs_out' INTO TABLE 
percentiles PARTITION ( account='cUU5T7y6DmdzMJFcFt3JDe', application='test', 
dataset='test', hour=341976, span=168 ); 
Loading data to table percentiles partition {account=cUU5T7y6DmdzMJFcFt3JDe, 
application=test, dataset=test, hour=341976, span=168}
OK

*The results in hadoop (8600 bytes)*

$ hadoop fs -ls /user/hive/warehouse/percentiles/*/*/*/*/*
-rw-r--r--   1 Josh supergroup       8600 2009-01-14 11:27 
/user/hive/warehouse/percentiles/account=cUU5T7y6DmdzMJFcFt3JDe/application=test/dataset=test/hour=341976/span=168/attempt_200901131908_0010_r_000000_0

So it works fine when loading via a LOAD DATA statement from a temporary 
directory but not in a query with an INSERT OVERWRITE statement.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to