Re: Efficiently filtering results directly in CS

2016-04-09 Thread vincent gromakowski
spark over c* can pushdown lots of things (basic filter or where clause to
more advanced semi join)

2016-04-09 3:54 GMT+02:00 kurt Greaves :

> If you're using C* 3.0 you can probably achieve this with UDFs.
> http://www.planetcassandra.org/blog/user-defined-functions-in-cassandra-3-0/
>
> On 9 April 2016 at 00:22, Kevin Burton  wrote:
>
>> Ha..  Yes... C*...  I guess I need something like coprocessors in
>> bigtable.
>>
>> On Fri, Apr 8, 2016 at 1:49 AM, vincent gromakowski <
>> vincent.gromakow...@gmail.com> wrote:
>>
>>> c* I suppose
>>>
>>> 2016-04-07 19:30 GMT+02:00 Jonathan Haddad :
>>>
 What is CS?

 On Thu, Apr 7, 2016 at 10:03 AM Kevin Burton 
 wrote:

> I have a paging model whereby we stream data from CS by fetching
> 'pages' thereby reading (sequentially) entire datasets.
>
> We're using the bucket approach where we write data for 5 minutes,
> then we can just fetch the bucket for that range.
>
> Our app now has TONS of data and we have a piece of middleware that
> filters it based on the client requests.
>
> So if they only want english they just get english and filter away
> about 60% of our data.
>
> but it doesn't support condition pushdown.  So ALL this data has to be
> sent from our CS boxes to our middleware and filtered there (wasting a lot
> of network IO).
>
> Is there away (including refactoring the code) that I could push this
> this into CS?  Maybe some way I could discovery the CS topology and put
> daemons on each of our CS boxes and fetch from CS directly (doing the
> filtering there).
>
> Thoughts?
>
> --
>
> We’re hiring if you know of any awesome Java Devops or Linux
> Operations Engineers!
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> 
>
>
>>>
>>
>>
>> --
>>
>> We’re hiring if you know of any awesome Java Devops or Linux Operations
>> Engineers!
>>
>> Founder/CEO Spinn3r.com
>> Location: *San Francisco, CA*
>> blog: http://burtonator.wordpress.com
>> … or check out my Google+ profile
>> 
>>
>>
>
>
> --
> Kurt Greaves
> k...@instaclustr.com
> www.instaclustr.com
>


Re: Efficiently filtering results directly in CS

2016-04-08 Thread kurt Greaves
If you're using C* 3.0 you can probably achieve this with UDFs.
http://www.planetcassandra.org/blog/user-defined-functions-in-cassandra-3-0/

On 9 April 2016 at 00:22, Kevin Burton  wrote:

> Ha..  Yes... C*...  I guess I need something like coprocessors in
> bigtable.
>
> On Fri, Apr 8, 2016 at 1:49 AM, vincent gromakowski <
> vincent.gromakow...@gmail.com> wrote:
>
>> c* I suppose
>>
>> 2016-04-07 19:30 GMT+02:00 Jonathan Haddad :
>>
>>> What is CS?
>>>
>>> On Thu, Apr 7, 2016 at 10:03 AM Kevin Burton  wrote:
>>>
 I have a paging model whereby we stream data from CS by fetching
 'pages' thereby reading (sequentially) entire datasets.

 We're using the bucket approach where we write data for 5 minutes, then
 we can just fetch the bucket for that range.

 Our app now has TONS of data and we have a piece of middleware that
 filters it based on the client requests.

 So if they only want english they just get english and filter away
 about 60% of our data.

 but it doesn't support condition pushdown.  So ALL this data has to be
 sent from our CS boxes to our middleware and filtered there (wasting a lot
 of network IO).

 Is there away (including refactoring the code) that I could push this
 this into CS?  Maybe some way I could discovery the CS topology and put
 daemons on each of our CS boxes and fetch from CS directly (doing the
 filtering there).

 Thoughts?

 --

 We’re hiring if you know of any awesome Java Devops or Linux Operations
 Engineers!

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 


>>
>
>
> --
>
> We’re hiring if you know of any awesome Java Devops or Linux Operations
> Engineers!
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> 
>
>


-- 
Kurt Greaves
k...@instaclustr.com
www.instaclustr.com


Re: Efficiently filtering results directly in CS

2016-04-08 Thread Kevin Burton
Ha..  Yes... C*...  I guess I need something like coprocessors in bigtable.


On Fri, Apr 8, 2016 at 1:49 AM, vincent gromakowski <
vincent.gromakow...@gmail.com> wrote:

> c* I suppose
>
> 2016-04-07 19:30 GMT+02:00 Jonathan Haddad :
>
>> What is CS?
>>
>> On Thu, Apr 7, 2016 at 10:03 AM Kevin Burton  wrote:
>>
>>> I have a paging model whereby we stream data from CS by fetching 'pages'
>>> thereby reading (sequentially) entire datasets.
>>>
>>> We're using the bucket approach where we write data for 5 minutes, then
>>> we can just fetch the bucket for that range.
>>>
>>> Our app now has TONS of data and we have a piece of middleware that
>>> filters it based on the client requests.
>>>
>>> So if they only want english they just get english and filter away about
>>> 60% of our data.
>>>
>>> but it doesn't support condition pushdown.  So ALL this data has to be
>>> sent from our CS boxes to our middleware and filtered there (wasting a lot
>>> of network IO).
>>>
>>> Is there away (including refactoring the code) that I could push this
>>> this into CS?  Maybe some way I could discovery the CS topology and put
>>> daemons on each of our CS boxes and fetch from CS directly (doing the
>>> filtering there).
>>>
>>> Thoughts?
>>>
>>> --
>>>
>>> We’re hiring if you know of any awesome Java Devops or Linux Operations
>>> Engineers!
>>>
>>> Founder/CEO Spinn3r.com
>>> Location: *San Francisco, CA*
>>> blog: http://burtonator.wordpress.com
>>> … or check out my Google+ profile
>>> 
>>>
>>>
>


-- 

We’re hiring if you know of any awesome Java Devops or Linux Operations
Engineers!

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile



Re: Efficiently filtering results directly in CS

2016-04-08 Thread vincent gromakowski
c* I suppose

2016-04-07 19:30 GMT+02:00 Jonathan Haddad :

> What is CS?
>
> On Thu, Apr 7, 2016 at 10:03 AM Kevin Burton  wrote:
>
>> I have a paging model whereby we stream data from CS by fetching 'pages'
>> thereby reading (sequentially) entire datasets.
>>
>> We're using the bucket approach where we write data for 5 minutes, then
>> we can just fetch the bucket for that range.
>>
>> Our app now has TONS of data and we have a piece of middleware that
>> filters it based on the client requests.
>>
>> So if they only want english they just get english and filter away about
>> 60% of our data.
>>
>> but it doesn't support condition pushdown.  So ALL this data has to be
>> sent from our CS boxes to our middleware and filtered there (wasting a lot
>> of network IO).
>>
>> Is there away (including refactoring the code) that I could push this
>> this into CS?  Maybe some way I could discovery the CS topology and put
>> daemons on each of our CS boxes and fetch from CS directly (doing the
>> filtering there).
>>
>> Thoughts?
>>
>> --
>>
>> We’re hiring if you know of any awesome Java Devops or Linux Operations
>> Engineers!
>>
>> Founder/CEO Spinn3r.com
>> Location: *San Francisco, CA*
>> blog: http://burtonator.wordpress.com
>> … or check out my Google+ profile
>> 
>>
>>


Re: Efficiently filtering results directly in CS

2016-04-07 Thread Jonathan Haddad
What is CS?

On Thu, Apr 7, 2016 at 10:03 AM Kevin Burton  wrote:

> I have a paging model whereby we stream data from CS by fetching 'pages'
> thereby reading (sequentially) entire datasets.
>
> We're using the bucket approach where we write data for 5 minutes, then we
> can just fetch the bucket for that range.
>
> Our app now has TONS of data and we have a piece of middleware that
> filters it based on the client requests.
>
> So if they only want english they just get english and filter away about
> 60% of our data.
>
> but it doesn't support condition pushdown.  So ALL this data has to be
> sent from our CS boxes to our middleware and filtered there (wasting a lot
> of network IO).
>
> Is there away (including refactoring the code) that I could push this this
> into CS?  Maybe some way I could discovery the CS topology and put daemons
> on each of our CS boxes and fetch from CS directly (doing the filtering
> there).
>
> Thoughts?
>
> --
>
> We’re hiring if you know of any awesome Java Devops or Linux Operations
> Engineers!
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> 
>
>


Efficiently filtering results directly in CS

2016-04-07 Thread Kevin Burton
I have a paging model whereby we stream data from CS by fetching 'pages'
thereby reading (sequentially) entire datasets.

We're using the bucket approach where we write data for 5 minutes, then we
can just fetch the bucket for that range.

Our app now has TONS of data and we have a piece of middleware that filters
it based on the client requests.

So if they only want english they just get english and filter away about
60% of our data.

but it doesn't support condition pushdown.  So ALL this data has to be sent
from our CS boxes to our middleware and filtered there (wasting a lot of
network IO).

Is there away (including refactoring the code) that I could push this this
into CS?  Maybe some way I could discovery the CS topology and put daemons
on each of our CS boxes and fetch from CS directly (doing the filtering
there).

Thoughts?

-- 

We’re hiring if you know of any awesome Java Devops or Linux Operations
Engineers!

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile