I think it'd be a good idea to also discuss metadata storage in parquet as
well.
Right now we jam things into data pages (dictionary header) and file
footers (statistics objects), neither of which are good for fast queries.
If we hide metadata storage behind an interface, we could start doing
things like storing stats + dictionaries + indexes in a database (or
sqllite file or w/e), and at job submit-time we could query the database to
do fast push-down filters and such.

Another discussion worth having is full inverted-indexes (like how Lucene
works) for search use cases. It might even be interesting to see if parquet
can interop w/ Lucene (there's actually a lot of similarities between
parquet and lucene when it comes to data storage).

On Thu, Jul 9, 2015 at 9:56 AM, Ryan Blue <[email protected]> wrote:

> Lately, we've been discussing bloom filters on PARQUET-41. It looks like a
> good option for that is to put them into some form of index page that can
> be used or ignored.
>
> I'm really interested in hearing about the other ideas that have been
> thrown out on this.
>
> rb
>
> On 07/09/2015 09:22 AM, Jacques Nadeau wrote:
>
>> I think we should start a design discussion around this.  I think there
>> were early ideas by some of the initial authors.  However, I don't think
>> it
>> has been designed.
>> On Jul 9, 2015 9:16 AM, "Patrick Woody" <[email protected]> wrote:
>>
>>  Just wanted to follow up here. Is there any information on index pages
>>> available?
>>>
>>> On Thu, Jul 2, 2015 at 4:22 PM, Patrick Woody <[email protected]>
>>> wrote:
>>>
>>>  Hey all,
>>>>
>>>> I've seen various mentions about Parquet index pages in the docs and
>>>> various slides/talks. Is there any up to date resource on what the plan
>>>>
>>> for
>>>
>>>> these are?
>>>>
>>>> Thanks!
>>>> -Pat
>>>>
>>>>
>>>
>>
>
> --
> Ryan Blue
> Software Engineer
> Cloudera, Inc.
>



-- 
Alex Levenson
@THISWILLWORK

Reply via email to