[ https://issues.apache.org/jira/browse/HIVE-233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12681556#action_12681556 ]
Josh Ferguson commented on HIVE-233: ------------------------------------ Sadly due to this issue I have changed the way I write and execute all my queries so I don't really have any of the queries or setup around for doing it anymore. If your test cases work then maybe I was just doing something wrong. I may have another case in the future where I make a query sort of like this one again but it won't be for a while. Maybe this issue should just be marked as something else for now. my activities table looks like this: CREATE TABLE activities (occurred_at INT, actor_id STRING, actee_id STRING, properties MAP<STRING, STRING>) PARTITIONED BY (account STRING, application STRING, dataset STRING, hour INT) CLUSTERED BY (actor_id, actee_id) INTO 32 BUCKETS ROW FORMAT DELIMITED COLLECTION ITEMS TERMINATED BY '44' MAP KEYS TERMINATED BY '58' STORED AS TEXTFILE; The script can output anything in the proper format and the results are the same. > Inserting into a table from a map/reduce transform results in no data > --------------------------------------------------------------------- > > Key: HIVE-233 > URL: https://issues.apache.org/jira/browse/HIVE-233 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor > Affects Versions: 0.3.0 > Reporter: Josh Ferguson > Assignee: Josh Ferguson > Priority: Blocker > > When attempting to run a query of the form > INSERT OVERWRITE TABLE table_name PARTITION ( ... ) SELECT TRANSFORM ... > The table 'table_name' ends up empty even when the SELECT statement returns > valid results that can be loaded by hand. > *My target table* > CREATE TABLE percentiles > (actor_id STRING, percentile INT, count INT) > PARTITIONED BY (account STRING, application STRING, dataset STRING, hour INT, > span INT) > CLUSTERED BY (actor_id) INTO 32 BUCKETS > ROW FORMAT DELIMITED > COLLECTION ITEMS TERMINATED BY '44' > MAP KEYS TERMINATED BY '58' > STORED AS TEXTFILE; > *The attempted query* > INSERT OVERWRITE TABLE percentiles PARTITION ( > account='cUU5T7y6DmdzMJFcFt3JDe', application='test', dataset='test', > hour=341976, span=168 ) SELECT TRANSFORM(actor_id) USING > '/Users/Josh/cluster/bin/percentiles.rb' AS (actor_id, percentile, count) > FROM ( SELECT actor_id FROM activities CLUSTER BY actor_id ) actors; > *The result in hadoop (which is 0 bytes)* > $ hadoop fs -ls /user/hive/warehouse/percentiles/*/*/*/*/* > -rw-r--r-- 1 Josh supergroup 0 2009-01-14 11:19 > /user/hive/warehouse/percentiles/account=cUU5T7y6DmdzMJFcFt3JDe/application=test/dataset=test/hour=341976/span=168/attempt_200901131908_0009_r_000000_0 > *Inserting into a temporary directory first* > hive> INSERT OVERWRITE DIRECTORY 'hdfs://localhost:9000/tmp/hdfs_out' SELECT > TRANSFORM(actor_id) USING '/Users/Josh/cluster/bin/percentiles.rb' AS > (actor_id, percentile, count) FROM ( SELECT actor_id FROM activities CLUSTER > BY actor_id ) actors; > *The results in hadoop (8600 bytes)* > $ hadoop fs -ls /tmp/hdfs_out > Found 1 items > -rw-r--r-- 1 Josh supergroup 8600 2009-01-14 11:27 > /tmp/hdfs_out/attempt_200901131908_0010_r_000000_0 > *Loading from the temporary directory into percentiles* > hive> LOAD DATA INPATH 'hdfs://localhost:9000/tmp/hdfs_out' INTO TABLE > percentiles PARTITION ( account='cUU5T7y6DmdzMJFcFt3JDe', application='test', > dataset='test', hour=341976, span=168 ); > Loading data to table percentiles partition {account=cUU5T7y6DmdzMJFcFt3JDe, > application=test, dataset=test, hour=341976, span=168} > OK > *The results in hadoop (8600 bytes)* > $ hadoop fs -ls /user/hive/warehouse/percentiles/*/*/*/*/* > -rw-r--r-- 1 Josh supergroup 8600 2009-01-14 11:27 > /user/hive/warehouse/percentiles/account=cUU5T7y6DmdzMJFcFt3JDe/application=test/dataset=test/hour=341976/span=168/attempt_200901131908_0010_r_000000_0 > So it works fine when loading via a LOAD DATA statement from a temporary > directory but not in a query with an INSERT OVERWRITE statement. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.