Hi Aaron,

Thanks for the clarification. As for me I really haven't made any progress
and the direction I worked was to store the analyzer definition in
zookeeper in a different way and some minor changes to the thrift interface
(type, stricttypes) and generated the code. I am not able to find time to
contribute anything significant. If there is anything specific you want me
to take up, do let me know.

- Rahul


On Wed, Jul 17, 2013 at 5:00 PM, Aaron McCurry <[email protected]> wrote:

> Rahul,
>
> After giving it some thought I think we should store all the meta data
> about tables in hdfs. Let explain why.  We have run into issues where my
> project wants to remove a table from blur but not delete the indexes (maybe
> because it's a test system with multiple versions of the same data).
> However the problem is you can't just import the table back into blur
> because the column definitions are stored in zookeeper and they have
> already been destroyed.
>
> That's why I have implemented a hdfs field manager that stores the meta
> data in hdfs. I don't really have a feel yet how well this will work but
> the basic way the base field manager is implemented is it storage agnostic.
> So any implementing sub class has to implement a way to store and load
> column definitions.
>
> Further because the model we are implementing is write a col def once per
> field and never modify. I think this will fit well within hdfs's
> capabilities. Because hdfs enforces atomic file creation so no 2 nodes can
> create the same column definition at least with the way I have implemented
> it.
>
> Take a look at what's there and let me know what you think. Thanks!
>
> Aaron
>
> Sent from my iPad
>
> On Jul 17, 2013, at 11:53 AM, rahul challapalli <
> [email protected]> wrote:
>
> > Hi Aaron,
> >
> > Can you elaborate on your thoughts about how to store the Analyzer
> > Definition in zookeeper?
> >
> > Below example is from my notes in the past. Let me know what you think
> >
> > /blur/default/tables/words/default-column-definition : value
> >
> >
> /blur/default/tables/words/column-families/fam1/default-column-definition :
> > value
> >
> > /blur/default/tables/words/column-families/fam1/col1 : value
> >
> > /blur/default/tables/words/column-families/fam1/col2 : value
> >
> >
> > - Rahul
> >
> >
> > On Tue, Jul 16, 2013 at 6:06 PM, Aaron McCurry <[email protected]>
> wrote:
> >
> >> On Tue, Jul 16, 2013 at 1:24 AM, rahul challapalli <
> >> [email protected]> wrote:
> >>
> >>> Hi Aaron,
> >>>
> >>> I started looking into the functionality you already added. A few
> >>> observations :
> >>>
> >>> In the Blur.thrift file, AnalyzerDefinition is removed from the
> >>> TableDescriptor. Was this intentional? If so can you give us an example
> >> of
> >>> how to use them?
> >>>
> >>
> >> Removing the AnalyzerDefinition was intentional.  The motivation there
> is
> >> to allow the schema (Families,Columns,and Types) to be set/added
> >> independently of the creation of the table.  I have not created any new
> >> thrift rpc calls to add new column definitions but ultimately it will
> >> call addColumnDefinition
> >> on the FieldManager class.
> >>
> >>
> https://git-wip-us.apache.org/repos/asf?p=incubator-blur.git;a=blob;f=blur-
> >>
> >>
> query/src/main/java/org/apache/blur/analysis/FieldManager.java;h=2271726e55bb9356ca6f2b6edf7a5fdec46b36c4;hb=ae516a442767b31d2c7e29b07a78aa08ec246dcf<
> https://git-wip-us.apache.org/repos/asf?p=incubator-blur.git;a=blob;f=blur-
> >>
> query/src/main/java/org/apache/blur/analysis/FieldManager.java;h=2271726e55bb9356ca6f2b6edf7a5fdec46b36c4;hb=ae516a442767b31d2c7e29b07a78aa08ec246dcf>
> >>
> >>
> >>> I modified the Blur.thrift(Column and TableDescriptor) and generated
> the
> >>> code. I don't know how to handle scenarios where minor changes are made
> >> and
> >>> need to be pushed into the branch. Otherwise it becomes a big commit if
> >> we
> >>> try to associate with a specific JIRA ticket?
> >>>
> >>
> >> I think that you should attach a patch to the jira ticket.  I can review
> >> and merge then we can work from the same baseline.  Then we can repeat
> that
> >> process as many times as needed.
> >>
> >>
> >>>
> >>> I added a bunch of code to the MutationHelper class to validate
> in-bound
> >>> columns. Can you check whether my understanding is aligned with the
> >>> requirement?
> >>
> >>
> >>> public static Column validateColumn(String family, Column col,
> >>> booleanstrict, FieldManager fieldManager) {
> >>>
> >>> if (strict == true) {
> >>>
> >>>  if (col.type == null) {
> >>>
> >>>    throw new RuntimeException("The type of the column is a required
> >> field
> >>> for this table. To turn off this behavior set strictTypes=false on the
> >>> TableDesciptor");
> >>>
> >>>  }
> >>>
> >>> }
> >>>
> >>>
> >>>
> >>> FieldTypeDefinition fieldTypeDefinition =
> >>> fieldManager.getFieldTypeDefinition(family + "." + col.name);
> >>>
> >>> if (fieldTypeDefinition == null) {
> >>>
> >>>  // TODO dynamic column : add new column definition
> >>>
> >>>    return col;
> >>>
> >>> }
> >>>
> >>> if (!fieldTypeDefinition.getName().equalsIgnoreCase(col.type)) {
> >>>
> >>>  throw new RuntimeException("The type defined in the column does not
> >> match
> >>> the existing type definition");
> >>>
> >>> }
> >>>
> >>> return col;
> >>>
> >>>  }
> >>>
> >>
> >> Yes this looks good, but just an FYI I like to always throw
> BlurExceptions
> >> instead of RuntimeExceptions.  The main reason for this (across the
> board)
> >> is that Thrift will wrap all exceptions that are not BlurExceptions or
> >> TExceptions in a TException.  When this happens that client thinks that
> >> something went wrong with the connection and will retry the call over
> >> several times.
> >>
> >> Thanks!
> >>
> >> Aaron
> >>
> >>>
> >>>
> >>> - Rahul
> >>>
> >>>
> >>> On Tue, Jul 2, 2013 at 4:27 PM, Aaron McCurry <[email protected]>
> >> wrote:
> >>>
> >>>> I have created a new branch where I have been working on rewriting the
> >>>> type/analyzer system for what seems like the 3rd or 4th time.  So
> >>> hopefully
> >>>> it will turn out better this time.
> >>>>
> >>>>
> >>>>
> >>>
> >>
> https://git-wip-us.apache.org/repos/asf?p=incubator-blur.git;a=shortlog;h=refs/heads/0.2.0-newtypesystem
> >>>>
> >>>> If you have a chance I would love some feedback on what's been built
> >> thus
> >>>> far.
> >>>>
> >>>>
> >>>> The o.a.b.analysis package in the blur-query project:
> >>>>
> >>>>
> >>>>
> >>>
> >>
> https://git-wip-us.apache.org/repos/asf?p=incubator-blur.git;a=tree;f=blur-query/src/main/java/org/apache/blur/analysis;h=3db57e994d4e60cc81d94641482c69305767fab5;hb=4ebe74ef2e489d8a360220e0d2752c682042ab22
> >>>>
> >>>> And the o.a.b.analysis.type package in the blur-query project:
> >>>>
> >>>>
> >>>>
> >>>
> >>
> https://git-wip-us.apache.org/repos/asf?p=incubator-blur.git;a=tree;f=blur-query/src/main/java/org/apache/blur/analysis/type;h=44ca6e1114210ffd8d202a29a347f7b77e37142f;hb=4ebe74ef2e489d8a360220e0d2752c682042ab22
> >>>>
> >>>> The main classes to start looking at are BaseFileManager and the
> >>>> FieldTypeDefinition.  They will lead you to several implementations.
> >> My
> >>>> hope is that this API will allow us to support the given types in
> >> Lucene
> >>> as
> >>>> well as allowing other to create new FieldTypeDefinition(s) and extend
> >>>> Blur.
> >>>>
> >>>> Let me know what you think.  Thanks!
> >>>>
> >>>> Aaron
> >>>>
> >>>
> >>
>

Reply via email to