Thanks Brandon for the clarification. I'd like to support a use case where an index is built in a row in a CF.
So, as a starting point for a query, a known row with a larger number of columns will have to be selected. The split to the hadoop nodes should start at that level. Is this a common use case? Maybe, there is a way to do this using the current impl. itself, that I'm not seeing. If so, could you share with me on how to do this? On Mon, Sep 12, 2011 at 7:01 PM, Brandon Williams <dri...@gmail.com> wrote: > On Mon, Sep 12, 2011 at 12:35 AM, Tharindu Mathew <mcclou...@gmail.com> > wrote: > > Hi, > > > > I plan to do $subject and contribute. > > > > Right now, the hadoop integration splits according to the number of rows > in > > a slice predicate. This doesn't scale if a row has a large number of > > columns. > > > > I'd like to know from the cassandra-devs as to how feasible this is? > > It's feasible, but not entirely easy. Essentially you need to page > through the row since you can't know how large it is beforehand. IIRC > though, this breaks the current input format contract, since an entire > row is expected to be returned. > > -Brandon > -- Regards, Tharindu blog: http://mackiemathew.com/