I wanted to post back here that I had solved the problem I was having with the input format in the Sqoop plugin. It was using memebase client to perform the tap. Changing this to the couchbase client made it work. I figured it'd be useful to have it here in case other users run into the same issue. In the meantime, I did put the updated version of the input format here:
https://github.com/calrissian/couchbase-toolkit I've done some work with Couchbase + Elasticsearch + Tinkerpop's Gremlin. I'm grabbing snapshots of graphs from Couchbase every hour with the InputFormat and it appears to be working well. Though it would definitely be faster if I was able to perform filters at the TAP level... I know that's a complicated thing to ask for. On Wednesday, March 12, 2014 9:48:51 PM UTC-4, Corey Nolet wrote: > > I *think* i may have isolated this issue to a client version- though it > doesn't make sense to me why the sqoop plugin isn't working. I'm going to > try upgrading my client libs to the newest version. > > On Wednesday, March 12, 2014 4:03:07 PM UTC-4, Corey Nolet wrote: >> >> Would it possible for someone to provide me with an effective example on >> how to use the TapClient in couchbase/memcached with a couchbase server >> installation? >> >> I've been banging my head against the wall for days on this. I need to be >> able to dump out my couchbase keys/values every hour into HDFS so I can >> map/reduce over them. I'm using CDH3u4 and the Sqoop connector is freezing >> up when it begins its map/reduce job. I do not have the luxury of updating >> to the Sqoop CDH4 version unfortunately but I've seen people complaining of >> the same problems with that version. >> >> What I've tried is using the TapClient with both the Couchbase libraries >> and the spy memcached libraries in java. Even with exponential backoff, I >> can't seem to get the TapClient to return a message where I can pull off a >> key and a value (it appears I get 'null" for getNextmessage() even with an >> appropriate timeout of 5 minutes). >> >> What can I do to get this to work? I've been using Couchbase behind >> Twitter Storm to help with caching for CEP. I've also been using it as a >> real-time query engine of the underlying CEP cache with ElasticSearch for >> my customer. If I can't dump the data out to HDFS directly, then I may need >> to look at other options. I am trying to stay away from views because I >> want to hit memory directly. I'd also like to preserve data locality if >> possible (connect directly to memcached or tell couchbase exactly which >> node(s) i'd like to retrieve keys from. >> >> What are my options here? >> >> >> I'm wondering if BigCouch would allow me to do this effectively. >> >> Thanks much! >> >> >> On Monday, March 10, 2014 11:52:57 PM UTC-4, Corey Nolet wrote: >>> >>> I recently tried the Sqoop connector for Couchbase 2 and it doesn't >>> appear to be working as expected. I have written my own InputFormat here: >>> >>> >>> https://github.com/cjnolet/cloud-toolkit/blob/master/src/main/java/org/calrissian/hadoop/memcached/MemcachedInputFormat.java >>> >>> I haven't gotten a chance to test it yet but I wanted to know if MOXI >>> would make it hard to get the locality that Im expecting from each of the >>> memcached instances. When I connect to a memcached instance (backing >>> couchbase) on port 11211, will each of those memcached instances give me >>> ALL of the keys in couchbase? or will they only give me the keys that they >>> contain separately? >>> >>> >>> Thanks! >>> >> -- You received this message because you are subscribed to the Google Groups "Couchbase" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
