[jira] Commented: (HIVE-233) Inserting into a table from a map/reduce transform results in no data

Josh Ferguson (JIRA) Thu, 12 Mar 2009 17:25:12 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12681556#action_12681556
 ]


Josh Ferguson commented on HIVE-233:
------------------------------------

Sadly due to this issue I have changed the way I write and execute all my 
queries so I don't really have any of the queries or setup around for doing it 
anymore. If your test cases work then maybe I was just doing something wrong. I 
may have another case in the future where I make a query sort of like this one 
again but it won't be for a while.

Maybe this issue should just be marked as something else for now. 

my activities table looks like this:

CREATE TABLE activities 
(occurred_at INT, actor_id STRING, actee_id STRING, properties MAP<STRING, 
STRING>) 
PARTITIONED BY (account STRING, application STRING, dataset STRING, hour INT) 
CLUSTERED BY (actor_id, actee_id) INTO 32 BUCKETS 
ROW FORMAT DELIMITED 
COLLECTION ITEMS TERMINATED BY '44'
MAP KEYS TERMINATED BY '58'
STORED AS TEXTFILE;

The script can output anything in the proper format and the results are the 
same.

> Inserting into a table from a map/reduce transform results in no data
> ---------------------------------------------------------------------
>
>                 Key: HIVE-233
>                 URL: https://issues.apache.org/jira/browse/HIVE-233
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.3.0
>            Reporter: Josh Ferguson
>            Assignee: Josh Ferguson
>            Priority: Blocker
>
> When attempting to run a query of the form 
> INSERT OVERWRITE TABLE table_name PARTITION ( ... ) SELECT TRANSFORM ... 
> The table 'table_name' ends up empty even when the SELECT statement returns 
> valid results that can be loaded by hand.
> *My target table*
> CREATE TABLE percentiles
> (actor_id STRING, percentile INT, count INT) 
> PARTITIONED BY (account STRING, application STRING, dataset STRING, hour INT, 
> span INT) 
> CLUSTERED BY (actor_id) INTO 32 BUCKETS 
> ROW FORMAT DELIMITED 
> COLLECTION ITEMS TERMINATED BY '44'
> MAP KEYS TERMINATED BY '58'
> STORED AS TEXTFILE;
> *The attempted query*
> INSERT OVERWRITE TABLE percentiles PARTITION ( 
> account='cUU5T7y6DmdzMJFcFt3JDe', application='test', dataset='test', 
> hour=341976, span=168 ) SELECT TRANSFORM(actor_id) USING 
> '/Users/Josh/cluster/bin/percentiles.rb' AS (actor_id, percentile, count) 
> FROM ( SELECT actor_id FROM activities CLUSTER BY actor_id ) actors;
> *The result in hadoop (which is 0 bytes)*
> $ hadoop fs -ls /user/hive/warehouse/percentiles/*/*/*/*/*
> -rw-r--r--   1 Josh supergroup          0 2009-01-14 11:19 
> /user/hive/warehouse/percentiles/account=cUU5T7y6DmdzMJFcFt3JDe/application=test/dataset=test/hour=341976/span=168/attempt_200901131908_0009_r_000000_0
> *Inserting into a temporary directory first*
> hive> INSERT OVERWRITE DIRECTORY 'hdfs://localhost:9000/tmp/hdfs_out' SELECT 
> TRANSFORM(actor_id) USING '/Users/Josh/cluster/bin/percentiles.rb' AS 
> (actor_id, percentile, count) FROM ( SELECT actor_id FROM activities CLUSTER 
> BY actor_id ) actors;
> *The results in hadoop (8600 bytes)*
> $ hadoop fs -ls /tmp/hdfs_out
> Found 1 items
> -rw-r--r--   1 Josh supergroup       8600 2009-01-14 11:27 
> /tmp/hdfs_out/attempt_200901131908_0010_r_000000_0
> *Loading from the temporary directory into percentiles*
> hive> LOAD DATA INPATH 'hdfs://localhost:9000/tmp/hdfs_out' INTO TABLE 
> percentiles PARTITION ( account='cUU5T7y6DmdzMJFcFt3JDe', application='test', 
> dataset='test', hour=341976, span=168 ); 
> Loading data to table percentiles partition {account=cUU5T7y6DmdzMJFcFt3JDe, 
> application=test, dataset=test, hour=341976, span=168}
> OK
> *The results in hadoop (8600 bytes)*
> $ hadoop fs -ls /user/hive/warehouse/percentiles/*/*/*/*/*
> -rw-r--r--   1 Josh supergroup       8600 2009-01-14 11:27 
> /user/hive/warehouse/percentiles/account=cUU5T7y6DmdzMJFcFt3JDe/application=test/dataset=test/hour=341976/span=168/attempt_200901131908_0010_r_000000_0
> So it works fine when loading via a LOAD DATA statement from a temporary 
> directory but not in a query with an INSERT OVERWRITE statement.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-233) Inserting into a table from a map/reduce transform results in no data

Reply via email to