I wanted to post back here that I had solved the problem I was having with 
the input format in the Sqoop plugin. It was using memebase client to 
perform the tap. Changing this to the couchbase client made it work. I 
figured it'd be useful to have it here in case other users run into the 
same issue. In the meantime, I did put the updated version of the input 
format here:

https://github.com/calrissian/couchbase-toolkit

I've done some work with Couchbase + Elasticsearch + Tinkerpop's Gremlin. 
I'm grabbing snapshots of graphs from Couchbase every hour with the 
InputFormat and it appears to be working well. Though it would definitely 
be faster if I was able to perform filters at the TAP level... I know 
that's a complicated thing to ask for.

On Wednesday, March 12, 2014 9:48:51 PM UTC-4, Corey Nolet wrote:
>
> I *think* i may have isolated this issue to a client version- though it 
> doesn't make sense to me why the sqoop plugin isn't working. I'm going to 
> try upgrading my client libs to the newest version.
>
> On Wednesday, March 12, 2014 4:03:07 PM UTC-4, Corey Nolet wrote:
>>
>> Would it possible for someone to provide me with an effective example on 
>> how to use the TapClient in couchbase/memcached with a couchbase server 
>> installation?
>>
>> I've been banging my head against the wall for days on this. I need to be 
>> able to dump out my couchbase keys/values every hour into HDFS so I can 
>> map/reduce over them. I'm using CDH3u4 and the Sqoop connector is freezing 
>> up when it begins its map/reduce job. I do not have the luxury of updating 
>> to the Sqoop CDH4 version unfortunately but I've seen people complaining of 
>> the same problems with that version.
>>
>> What I've tried is using the TapClient with both the Couchbase libraries 
>> and the spy memcached libraries in java. Even with exponential backoff, I 
>> can't seem to get the TapClient to return a message where I can pull off a 
>> key and a value (it appears I get 'null" for getNextmessage() even with an 
>> appropriate timeout of 5 minutes).
>>
>> What can I do to get this to work? I've been using Couchbase behind 
>> Twitter Storm to help with caching for CEP. I've also been using it as a 
>> real-time query engine of the underlying CEP cache with ElasticSearch for 
>> my customer. If I can't dump the data out to HDFS directly, then I may need 
>> to look at other options. I am trying to stay away from views because I 
>> want to hit memory directly. I'd also like to preserve data locality if 
>> possible (connect directly to memcached or tell couchbase exactly which 
>> node(s) i'd like to retrieve keys from.
>>
>> What are my options here?
>>
>>
>> I'm wondering if BigCouch would allow me to do this effectively.
>>
>> Thanks much!
>>
>>
>> On Monday, March 10, 2014 11:52:57 PM UTC-4, Corey Nolet wrote:
>>>
>>> I recently tried the Sqoop connector for Couchbase 2 and it doesn't 
>>> appear to be working as expected. I have written my own InputFormat here:
>>>
>>>
>>> https://github.com/cjnolet/cloud-toolkit/blob/master/src/main/java/org/calrissian/hadoop/memcached/MemcachedInputFormat.java
>>>
>>> I haven't gotten a chance to test it yet but I wanted to know if MOXI 
>>> would make it hard to get the locality that Im expecting from each of the 
>>> memcached instances. When I connect to a memcached instance (backing 
>>> couchbase) on port 11211, will each of those memcached instances give me 
>>> ALL of the keys in couchbase? or will they only give me the keys that they 
>>> contain separately?
>>>
>>>
>>> Thanks!
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Couchbase" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to