Re: Implementing a input format that splits according to column size

Tharindu Mathew Mon, 12 Sep 2011 11:54:48 -0700

Thanks Brandon for the clarification.

I'd like to support a use case where an index is built in a row in a CF.

So, as a starting point for a query, a known row with a larger number of
columns will have to be selected. The split to the hadoop nodes should start
at that level.

Is this a common use case?

Maybe, there is a way to do this using the current impl. itself, that I'm
not seeing. If so, could you share with me on how to do this?

On Mon, Sep 12, 2011 at 7:01 PM, Brandon Williams <dri...@gmail.com> wrote:

> On Mon, Sep 12, 2011 at 12:35 AM, Tharindu Mathew <mcclou...@gmail.com>
> wrote:
> > Hi,
> >
> > I plan to do $subject and contribute.
> >
> > Right now, the hadoop integration splits according to the number of rows
> in
> > a slice predicate. This doesn't scale if a row has a large number of
> > columns.
> >
> > I'd like to know from the cassandra-devs as to how feasible this is?
>
> It's feasible, but not entirely easy.  Essentially you need to page
> through the row since you can't know how large it is beforehand.  IIRC
> though, this breaks the current input format contract, since an entire
> row is expected to be returned.
>
> -Brandon
>

-- 
Regards,

Tharindu

blog: http://mackiemathew.com/

Re: Implementing a input format that splits according to column size

Reply via email to