Re: Pig indexing

Dmitriy Ryaboy Wed, 22 Sep 2010 10:03:56 -0700

Renato,
Using indexes is "just" a matter of writing a loader that is aware of said
indexes. Merge join already builds an index and uses it as part of its
internals.
With filters being offered to loaders that claim to implement filter
push-down, there is no reason not to have a loader that can look up block
locations in some index, and only create splits for blocks that contain the
unfiltered values, for example.  One thing to note is that currently there
is no automatic index creation (since you can load arbitrary data), so you
need to code up a way to look up which of the resources you are trying to
load have been indexed.


-D

On Tue, Sep 21, 2010 at 6:32 PM, Renato Marroquín Mogrovejo <
renatoj.marroq...@gmail.com> wrote:

> Hi everyone!
>
> After reading Ed's email, I got really intrigued about Pig using indexes, I
> thought those were just plans lol
> But as commented in here https://issues.apache.org/jira/browse/PIG-209, we
> could use indexing through Zebra, right? But that means that we would have
> to preload our data into Zebra, "sort it" in a similar way to the sorted
> table union example of the wiki, and then if we make a join using them,
> this
> join is made in a similar way to the work of  Hung-chih Yang et al. ??
> Is there any published papers or technical overview on Pig/Zebra or
> MapReduce/Zebra?
> Thanks in advanced.
>
>
> Renato M.
>

Re: Pig indexing

Reply via email to