BulkOutputFormat and CQL3

2014-04-22 Thread James Campbell
Hi Cassandra Users- I have a Hadoop job that uses the pattern in Cassandra 2.0.6's hadoop_cql3_word_count example to load data from HDFS into Cassandra. Having read about BulkOutputFormat as a way to potentially significantly increase the write throughput from Hadoop to Cassandra,

Re: Bulkoutputformat

2013-12-17 Thread Aaron Morton
eed to use Hadoop then try the SSTableSimpleWriter and > sstableloader , this post is a little old but still relevant > http://www.datastax.com/dev/blog/bulk-loading > > Otherwise AFAIK BulkOutputFormat is what you want from hadoop > http://www.datastax.com/docs/1.1/cluster_archit

Re: Bulkoutputformat

2013-12-13 Thread varun allampalli
gt;> On Wed, Dec 11, 2013 at 7:58 PM, Aaron Morton >>> wrote: >>> >>>> If you don’t need to use Hadoop then try the SSTableSimpleWriter and >>>> sstableloader , this post is a little old but still relevant >>>> http://www.datastax.com/dev/blog/b

Re: Bulkoutputformat

2013-12-13 Thread Rahul Menon
pc_timeout. >> >> >> On Wed, Dec 11, 2013 at 7:58 PM, Aaron Morton wrote: >> >>> If you don’t need to use Hadoop then try the SSTableSimpleWriter and >>> sstableloader , this post is a little old but still relevant >>> http://www.datastax.com/dev/b

Re: Bulkoutputformat

2013-12-12 Thread varun allampalli
.com/dev/blog/bulk-loading >> >> Otherwise AFAIK BulkOutputFormat is what you want from hadoop >> http://www.datastax.com/docs/1.1/cluster_architecture/hadoop_integration >> >> Cheers >> >> - >> Aaron Morton >> New Zealand >> @aaronm

Re: Bulkoutputformat

2013-12-12 Thread varun allampalli
, 2013 at 7:58 PM, Aaron Morton wrote: > If you don’t need to use Hadoop then try the SSTableSimpleWriter and > sstableloader , this post is a little old but still relevant > http://www.datastax.com/dev/blog/bulk-loading > > Otherwise AFAIK BulkOutputFormat is what you want fro

Re: Bulkoutputformat

2013-12-11 Thread Aaron Morton
If you don’t need to use Hadoop then try the SSTableSimpleWriter and sstableloader , this post is a little old but still relevant http://www.datastax.com/dev/blog/bulk-loading Otherwise AFAIK BulkOutputFormat is what you want from hadoop http://www.datastax.com/docs/1.1/cluster_architecture

Bulkoutputformat

2013-12-11 Thread varun allampalli
Hi All, I want to bulk insert data into cassandra. I was wondering of using BulkOutputformat in hadoop. Is it the best way or using driver and doing batch insert is the better way. Are there any disandvantages of using bulkoutputformat. Thanks for helping Varun

Re: multiple reducers with BulkOutputFormat on the same host

2013-01-24 Thread Alexei Bakanov
, January 24, 2013 at 6:49 AM, Alexei Bakanov wrote: > > Hello, > > We see that BulkOutputFormat fails to stream data from multiple reduce > instances that run on the same host. > We get the same error messages that issue > https://issues.apache.org/jira/browse/CASSANDRA-4223 tries

Re: multiple reducers with BulkOutputFormat on the same host

2013-01-24 Thread Yuki Morishita
Alexel, You were right. It was already fixed to use UUID for streaming session and released in 1.2.0. See https://issues.apache.org/jira/browse/CASSANDRA-4813. On Thursday, January 24, 2013 at 6:49 AM, Alexei Bakanov wrote: > Hello, > > We see that BulkOutputFormat fails to stream

multiple reducers with BulkOutputFormat on the same host

2013-01-24 Thread Alexei Bakanov
Hello, We see that BulkOutputFormat fails to stream data from multiple reduce instances that run on the same host. We get the same error messages that issue https://issues.apache.org/jira/browse/CASSANDRA-4223 tries to address. Looks like (ip-adress + in_out_flag + atomic integer) is not unique

Re: BulkOutputFormat

2013-01-17 Thread Michael Kjellman
<mailto:user@cassandra.apache.org>" mailto:user@cassandra.apache.org>> Date: Thursday, January 17, 2013 10:39 AM To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" mailto:user@cassandra.apache.org>> Subject: BulkOutputFormat Hello, I am facing issu

Re: BulkOutputFormat

2013-01-17 Thread chandra Varahala
"user@cassandra.apache.org" > Date: Thursday, January 17, 2013 10:39 AM > To: "user@cassandra.apache.org" > Subject: BulkOutputFormat > > Hello, > > I am facing issues with Bulkoutputformat loading data from hadoop to > cassandra. > > Cluster detai

Re: BulkOutputFormat

2013-01-17 Thread Michael Kjellman
y, January 17, 2013 10:39 AM To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" mailto:user@cassandra.apache.org>> Subject: BulkOutputFormat Hello, I am facing issues with Bulkoutputformat loading data from hadoop to cassandra. Cluster details : we have 15 nod

RE: BulkOutputFormat error - org.apache.thrift.transport.TTransportException

2012-12-19 Thread ANAND_BALARAMAN
The problem was with the compatibility. I was using a lower version of Cassandra jar files. Now, BulkOutputFormat works fine. -Original Message- From: anand_balara...@homedepot.com [mailto:anand_balara...@homedepot.com] Sent: Friday, December 14, 2012 12:37 AM To: user

RE: BulkOutputFormat error - org.apache.thrift.transport.TTransportException

2012-12-13 Thread ANAND_BALARAMAN
: user@cassandra.apache.org Subject: Re: BulkOutputFormat error - org.apache.thrift.transport.TTransportException Looks like it cannot connect to the server >conf.set("cassandra.output.thrift.address", "localhost"); Is this the same address as the rpc_address in the cas

Re: BulkOutputFormat error - org.apache.thrift.transport.TTransportException

2012-12-13 Thread aaron morton
land @aaronmorton http://www.thelastpickle.com On 14/12/2012, at 9:57 AM, anand_balara...@homedepot.com wrote: > Hi > > I am a newbie to Cassandra. Was trying out a sample (word count) code on > BulkOutputFormat and got stuck with an error. > > What I am trying to do is – migrate

BulkOutputFormat error - org.apache.thrift.transport.TTransportException

2012-12-13 Thread ANAND_BALARAMAN
Hi I am a newbie to Cassandra. Was trying out a sample (word count) code on BulkOutputFormat and got stuck with an error. What I am trying to do is - migrate all Hive tables (from Hadoop cluster) to Cassandra column families. My MR program is configured to run on Hadoop cluster v 0.20.2

Fwd: Cassandra BulkOutputFormat with Hadoop MRv1

2012-11-12 Thread Uldis Barbans
Hello, Is BulkOutputFormat intended to be compatible with MRv1 (mapred) at all? I'm trying to write to Cassandra, roughly following the example at http://shareitexploreit.blogspot.se/2012/03/bulkloadto-cassandra-with-hadoop.html but with MRv1 - that is, calling output.collect(r

Re: EOFException with BulkOutputFormat in 1.1.6

2012-10-17 Thread Michael Kjellman
.org>> Date: Wednesday, October 17, 2012 12:25 PM To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" mailto:user@cassandra.apache.org>> Subject: EOFException with BulkOutputFormat in 1.1.6 I'm getting EOFExceptio

EOFException with BulkOutputFormat in 1.1.6

2012-10-17 Thread Michael Kjellman
I'm getting EOFExceptions with BulkOutputFormat 2012-10-17 12:23:01,182 ERROR org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor: Error in ThreadPoolExecutor java.lang.RuntimeException: java.io.EOFException at org.apache.cassandra.utils.FBUtilities.unchecked(FBUtilities.java:62

RE: Problem while streaming SSTables with BulkOutputFormat

2012-10-11 Thread Ralph Romanos
cluster.Does that make sense? Cheers,Ralph From: matgan...@hotmail.com To: user@cassandra.apache.org Subject: RE: Problem while streaming SSTables with BulkOutputFormat Date: Tue, 9 Oct 2012 22:29:41 + Aaron,Thank you for your answer, I tried to move to Cassandra 1.1.5, but the error still

RE: Problem while streaming SSTables with BulkOutputFormat

2012-10-09 Thread Ralph Romanos
of this issue? CheersRalph > Subject: Re: Problem while streaming SSTables with BulkOutputFormat > From: aa...@thelastpickle.com > Date: Wed, 10 Oct 2012 10:05:13 +1300 > To: user@cassandra.apache.org > > Something, somewhere, at some point is breaking the connection. Sorry I &

Re: Problem while streaming SSTables with BulkOutputFormat

2012-10-09 Thread aaron morton
both a task tacker and cassandra ? Cheers On 9/10/2012, at 4:06 AM, Ralph Romanos wrote: > Hello, > > I am using BulkOutputFormat to load data from a .csv file into Cassandra. I > am using Cassandra 1.1.3 and Hadoop 0.20.2. > I have 7 hadoop nodes: 1 namenode/jobtracker

Problem while streaming SSTables with BulkOutputFormat

2012-10-08 Thread Ralph Romanos
Hello, I am using BulkOutputFormat to load data from a .csv file into Cassandra. I am using Cassandra 1.1.3 and Hadoop 0.20.2.I have 7 hadoop nodes: 1 namenode/jobtracker and 6 datanodes/tasktrackers. Cassandra is installed on 4 of these 6 datanodes/tasktrackers.The issue happens when I have

Re: cassandra/hadoop BulkOutputFormat failures

2012-09-17 Thread Brian Jeltema
Jeltema > wrote: > >> I'm trying to do a bulk load from a Cassandra/Hadoop job using the >> BulkOutputFormat class. >> It appears that the reducers are generating the SSTables, but is failing to >> load them into the cluster: >> >> 12/09/14 14:08:13

Re: cassandra/hadoop BulkOutputFormat failures

2012-09-14 Thread Jeremy Hanna
eason? If the temp dir hasn't been cleaned up yet, you are able to retry, fwiw. Jeremy On Sep 14, 2012, at 1:34 PM, Brian Jeltema wrote: > I'm trying to do a bulk load from a Cassandra/Hadoop job using the > BulkOutputFormat class. > It appears that the reducers are generat

cassandra/hadoop BulkOutputFormat failures

2012-09-14 Thread Brian Jeltema
I'm trying to do a bulk load from a Cassandra/Hadoop job using the BulkOutputFormat class. It appears that the reducers are generating the SSTables, but is failing to load them into the cluster: 12/09/14 14:08:13 INFO mapred.JobClient: Task Id : attempt_201208201337_0184_r_04_0, S

Re: stream data using bulkoutputformat

2012-05-04 Thread Jonathan Ellis
We're working on this over at https://issues.apache.org/jira/browse/CASSANDRA-4208 On Fri, May 4, 2012 at 4:56 PM, Shawna Qian wrote: > Hi Group: > > I am following this great example to use bulkouputformat to streaming the > data from hadoop to cassandra. > http://shareitexploreit.blogspot.com/2

stream data using bulkoutputformat

2012-05-04 Thread Shawna Qian
Hi Group: I am following this great example to use bulkouputformat to streaming the data from hadoop to cassandra. http://shareitexploreit.blogspot.com/2012/03/bulkloadto-cassandra-with-hado op.html. It works perfectly when my keyspace has one cf. But in my case, I have 2 coulumn families defined

Re: stream data using bulkoutputformat on hdfs?

2012-05-02 Thread Brandon Williams
On Wed, May 2, 2012 at 2:23 PM, Shawna Qian wrote: > Hello: > > I am trying to use bulkoutputformat and seeing some nice docs on how to use > it to stream the data to an existing cassandra cluster using configHelper > class.  I am wondering if it is possible to use it just to

stream data using bulkoutputformat on hdfs?

2012-05-02 Thread Shawna Qian
Hello: I am trying to use bulkoutputformat and seeing some nice docs on how to use it to stream the data to an existing cassandra cluster using configHelper class. I am wondering if it is possible to use it just to stream the data (sstable etc) into the hdfs? Thx Shawna

Re: Streaming sessions from BulkOutputFormat job being listed long after they were killed

2012-02-17 Thread Yuki Morishita
Friday, February 17, 2012 at 6:18 AM, Erik Forsberg wrote: > Hi! > > If I run a hadoop job that uses BulkOutputFormat to write data to > Cassandra, and that hadoop job is aborted, i.e. streaming sessions are > not completed, it seems like the streaming sessions hang around for a &

Streaming sessions from BulkOutputFormat job being listed long after they were killed

2012-02-17 Thread Erik Forsberg
Hi! If I run a hadoop job that uses BulkOutputFormat to write data to Cassandra, and that hadoop job is aborted, i.e. streaming sessions are not completed, it seems like the streaming sessions hang around for a very long time, I've observed at least 12-15h, in output from 'nodetool

Re: Can I use BulkOutputFormat from 1.1 to load data to older Cassandra versions?

2012-01-09 Thread Brandon Williams
On Mon, Jan 9, 2012 at 1:18 AM, Erik Forsberg wrote: > Hi! > > Can the new BulkOutputFormat > (https://issues.apache.org/jira/browse/CASSANDRA-3045) be used to load data > to servers running cassandra 0.8.7 and/or Cassandra 1.0.6? > > I'm thinking of using jar files fr

Can I use BulkOutputFormat from 1.1 to load data to older Cassandra versions?

2012-01-08 Thread Erik Forsberg
Hi! Can the new BulkOutputFormat (https://issues.apache.org/jira/browse/CASSANDRA-3045) be used to load data to servers running cassandra 0.8.7 and/or Cassandra 1.0.6? I'm thinking of using jar files from the development version to load data onto a production cluster which I want to ke