Re: CassandraStorage loader generating 2x many record?

2014-05-22 Thread Robert Coli
On Tue, May 20, 2014 at 1:44 PM, Kevin Burton bur...@spinn3r.com wrote:

 This has to be a bug or either that or I'm insane.


If it turns out you're not insane (;D) I suggest filing a JIRA ticket with
your repro steps at :

http://issues.apache.org

=Rob


CassandraStorage loader generating 2x many record?

2014-05-20 Thread Kevin Burton
This has to be a bug or either that or I'm insane.

Here's my table in Cassandra:

CREATE TABLE test_source (
  id int ,
  primary key(id)
);

INSERT INTO test_source (ID) VALUES(1);
INSERT INTO test_source (ID) VALUES(2);
INSERT INTO test_source (ID) VALUES(3);
INSERT INTO test_source (ID) VALUES(4);

cqlsh:blogindex select * from test_source;

 id

  1
  2
  4
  3

(4 rows)

… now I load that into pig and run:

test_source = LOAD 'cassandra://blogindex/test_source' USING
CassandraStorage() AS (source, target: bag {T: tuple(name, value)});

dump test_source;

(4,{((),)})
(1,{((),)})
(2,{((),)})
(4,{((),)})
(1,{((),)})
(3,{((),)})
(3,{((),)})
(2,{((),)})

… now it COULD be a bug with 'dump' … but even then that's a bug.

I suspect that Cassandra might be getting confused and giving too many rows
to pig due to maybe duplicating input splits?

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
Skype: *burtonator*
blog: http://burtonator.wordpress.com
… or check out my Google+
profilehttps://plus.google.com/102718274791889610666/posts
http://spinn3r.com
War is peace. Freedom is slavery. Ignorance is strength. Corporations are
people.