Inserting into a table from a map/reduce transform results in no data ---------------------------------------------------------------------
Key: HIVE-233 URL: https://issues.apache.org/jira/browse/HIVE-233 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Reporter: Josh Ferguson When attempting to run a query of the form INSERT OVERWRITE TABLE table_name PARTITION ( ... ) SELECT TRANSFORM ... The table 'table_name' ends up empty even when the SELECT statement returns valid results that can be loaded by hand. *My target table* CREATE TABLE percentiles (actor_id STRING, percentile INT, count INT) PARTITIONED BY (account STRING, application STRING, dataset STRING, hour INT, span INT) CLUSTERED BY (actor_id) INTO 32 BUCKETS ROW FORMAT DELIMITED COLLECTION ITEMS TERMINATED BY '44' MAP KEYS TERMINATED BY '58' STORED AS TEXTFILE; *The attempted query* INSERT OVERWRITE TABLE percentiles PARTITION ( account='cUU5T7y6DmdzMJFcFt3JDe', application='test', dataset='test', hour=341976, span=168 ) SELECT TRANSFORM(actor_id) USING '/Users/Josh/cluster/bin/percentiles.rb' AS (actor_id, percentile, count) FROM ( SELECT actor_id FROM activities CLUSTER BY actor_id ) actors; *The result in hadoop (which is 0 bytes)* $ hadoop fs -ls /user/hive/warehouse/percentiles/*/*/*/*/* -rw-r--r-- 1 Josh supergroup 0 2009-01-14 11:19 /user/hive/warehouse/percentiles/account=cUU5T7y6DmdzMJFcFt3JDe/application=test/dataset=test/hour=341976/span=168/attempt_200901131908_0009_r_000000_0 *Inserting into a temporary directory first* hive> INSERT OVERWRITE DIRECTORY 'hdfs://localhost:9000/tmp/hdfs_out' SELECT TRANSFORM(actor_id) USING '/Users/Josh/cluster/bin/percentiles.rb' AS (actor_id, percentile, count) FROM ( SELECT actor_id FROM activities CLUSTER BY actor_id ) actors; *The results in hadoop (8600 bytes)* $ hadoop fs -ls /tmp/hdfs_out Found 1 items -rw-r--r-- 1 Josh supergroup 8600 2009-01-14 11:27 /tmp/hdfs_out/attempt_200901131908_0010_r_000000_0 *Loading from the temporary directory into percentiles* hive> LOAD DATA INPATH 'hdfs://localhost:9000/tmp/hdfs_out' INTO TABLE percentiles PARTITION ( account='cUU5T7y6DmdzMJFcFt3JDe', application='test', dataset='test', hour=341976, span=168 ); Loading data to table percentiles partition {account=cUU5T7y6DmdzMJFcFt3JDe, application=test, dataset=test, hour=341976, span=168} OK *The results in hadoop (8600 bytes)* $ hadoop fs -ls /user/hive/warehouse/percentiles/*/*/*/*/* -rw-r--r-- 1 Josh supergroup 8600 2009-01-14 11:27 /user/hive/warehouse/percentiles/account=cUU5T7y6DmdzMJFcFt3JDe/application=test/dataset=test/hour=341976/span=168/attempt_200901131908_0010_r_000000_0 So it works fine when loading via a LOAD DATA statement from a temporary directory but not in a query with an INSERT OVERWRITE statement. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.