Renato, Using indexes is "just" a matter of writing a loader that is aware of said indexes. Merge join already builds an index and uses it as part of its internals. With filters being offered to loaders that claim to implement filter push-down, there is no reason not to have a loader that can look up block locations in some index, and only create splits for blocks that contain the unfiltered values, for example. One thing to note is that currently there is no automatic index creation (since you can load arbitrary data), so you need to code up a way to look up which of the resources you are trying to load have been indexed.
-D On Tue, Sep 21, 2010 at 6:32 PM, Renato Marroquín Mogrovejo < renatoj.marroq...@gmail.com> wrote: > Hi everyone! > > After reading Ed's email, I got really intrigued about Pig using indexes, I > thought those were just plans lol > But as commented in here https://issues.apache.org/jira/browse/PIG-209, we > could use indexing through Zebra, right? But that means that we would have > to preload our data into Zebra, "sort it" in a similar way to the sorted > table union example of the wiki, and then if we make a join using them, > this > join is made in a similar way to the work of Hung-chih Yang et al. ?? > Is there any published papers or technical overview on Pig/Zebra or > MapReduce/Zebra? > Thanks in advanced. > > > Renato M. >