Hi Aaron, Thanks for the clarification. As for me I really haven't made any progress and the direction I worked was to store the analyzer definition in zookeeper in a different way and some minor changes to the thrift interface (type, stricttypes) and generated the code. I am not able to find time to contribute anything significant. If there is anything specific you want me to take up, do let me know.
- Rahul On Wed, Jul 17, 2013 at 5:00 PM, Aaron McCurry <[email protected]> wrote: > Rahul, > > After giving it some thought I think we should store all the meta data > about tables in hdfs. Let explain why. We have run into issues where my > project wants to remove a table from blur but not delete the indexes (maybe > because it's a test system with multiple versions of the same data). > However the problem is you can't just import the table back into blur > because the column definitions are stored in zookeeper and they have > already been destroyed. > > That's why I have implemented a hdfs field manager that stores the meta > data in hdfs. I don't really have a feel yet how well this will work but > the basic way the base field manager is implemented is it storage agnostic. > So any implementing sub class has to implement a way to store and load > column definitions. > > Further because the model we are implementing is write a col def once per > field and never modify. I think this will fit well within hdfs's > capabilities. Because hdfs enforces atomic file creation so no 2 nodes can > create the same column definition at least with the way I have implemented > it. > > Take a look at what's there and let me know what you think. Thanks! > > Aaron > > Sent from my iPad > > On Jul 17, 2013, at 11:53 AM, rahul challapalli < > [email protected]> wrote: > > > Hi Aaron, > > > > Can you elaborate on your thoughts about how to store the Analyzer > > Definition in zookeeper? > > > > Below example is from my notes in the past. Let me know what you think > > > > /blur/default/tables/words/default-column-definition : value > > > > > /blur/default/tables/words/column-families/fam1/default-column-definition : > > value > > > > /blur/default/tables/words/column-families/fam1/col1 : value > > > > /blur/default/tables/words/column-families/fam1/col2 : value > > > > > > - Rahul > > > > > > On Tue, Jul 16, 2013 at 6:06 PM, Aaron McCurry <[email protected]> > wrote: > > > >> On Tue, Jul 16, 2013 at 1:24 AM, rahul challapalli < > >> [email protected]> wrote: > >> > >>> Hi Aaron, > >>> > >>> I started looking into the functionality you already added. A few > >>> observations : > >>> > >>> In the Blur.thrift file, AnalyzerDefinition is removed from the > >>> TableDescriptor. Was this intentional? If so can you give us an example > >> of > >>> how to use them? > >>> > >> > >> Removing the AnalyzerDefinition was intentional. The motivation there > is > >> to allow the schema (Families,Columns,and Types) to be set/added > >> independently of the creation of the table. I have not created any new > >> thrift rpc calls to add new column definitions but ultimately it will > >> call addColumnDefinition > >> on the FieldManager class. > >> > >> > https://git-wip-us.apache.org/repos/asf?p=incubator-blur.git;a=blob;f=blur- > >> > >> > query/src/main/java/org/apache/blur/analysis/FieldManager.java;h=2271726e55bb9356ca6f2b6edf7a5fdec46b36c4;hb=ae516a442767b31d2c7e29b07a78aa08ec246dcf< > https://git-wip-us.apache.org/repos/asf?p=incubator-blur.git;a=blob;f=blur- > >> > query/src/main/java/org/apache/blur/analysis/FieldManager.java;h=2271726e55bb9356ca6f2b6edf7a5fdec46b36c4;hb=ae516a442767b31d2c7e29b07a78aa08ec246dcf> > >> > >> > >>> I modified the Blur.thrift(Column and TableDescriptor) and generated > the > >>> code. I don't know how to handle scenarios where minor changes are made > >> and > >>> need to be pushed into the branch. Otherwise it becomes a big commit if > >> we > >>> try to associate with a specific JIRA ticket? > >>> > >> > >> I think that you should attach a patch to the jira ticket. I can review > >> and merge then we can work from the same baseline. Then we can repeat > that > >> process as many times as needed. > >> > >> > >>> > >>> I added a bunch of code to the MutationHelper class to validate > in-bound > >>> columns. Can you check whether my understanding is aligned with the > >>> requirement? > >> > >> > >>> public static Column validateColumn(String family, Column col, > >>> booleanstrict, FieldManager fieldManager) { > >>> > >>> if (strict == true) { > >>> > >>> if (col.type == null) { > >>> > >>> throw new RuntimeException("The type of the column is a required > >> field > >>> for this table. To turn off this behavior set strictTypes=false on the > >>> TableDesciptor"); > >>> > >>> } > >>> > >>> } > >>> > >>> > >>> > >>> FieldTypeDefinition fieldTypeDefinition = > >>> fieldManager.getFieldTypeDefinition(family + "." + col.name); > >>> > >>> if (fieldTypeDefinition == null) { > >>> > >>> // TODO dynamic column : add new column definition > >>> > >>> return col; > >>> > >>> } > >>> > >>> if (!fieldTypeDefinition.getName().equalsIgnoreCase(col.type)) { > >>> > >>> throw new RuntimeException("The type defined in the column does not > >> match > >>> the existing type definition"); > >>> > >>> } > >>> > >>> return col; > >>> > >>> } > >>> > >> > >> Yes this looks good, but just an FYI I like to always throw > BlurExceptions > >> instead of RuntimeExceptions. The main reason for this (across the > board) > >> is that Thrift will wrap all exceptions that are not BlurExceptions or > >> TExceptions in a TException. When this happens that client thinks that > >> something went wrong with the connection and will retry the call over > >> several times. > >> > >> Thanks! > >> > >> Aaron > >> > >>> > >>> > >>> - Rahul > >>> > >>> > >>> On Tue, Jul 2, 2013 at 4:27 PM, Aaron McCurry <[email protected]> > >> wrote: > >>> > >>>> I have created a new branch where I have been working on rewriting the > >>>> type/analyzer system for what seems like the 3rd or 4th time. So > >>> hopefully > >>>> it will turn out better this time. > >>>> > >>>> > >>>> > >>> > >> > https://git-wip-us.apache.org/repos/asf?p=incubator-blur.git;a=shortlog;h=refs/heads/0.2.0-newtypesystem > >>>> > >>>> If you have a chance I would love some feedback on what's been built > >> thus > >>>> far. > >>>> > >>>> > >>>> The o.a.b.analysis package in the blur-query project: > >>>> > >>>> > >>>> > >>> > >> > https://git-wip-us.apache.org/repos/asf?p=incubator-blur.git;a=tree;f=blur-query/src/main/java/org/apache/blur/analysis;h=3db57e994d4e60cc81d94641482c69305767fab5;hb=4ebe74ef2e489d8a360220e0d2752c682042ab22 > >>>> > >>>> And the o.a.b.analysis.type package in the blur-query project: > >>>> > >>>> > >>>> > >>> > >> > https://git-wip-us.apache.org/repos/asf?p=incubator-blur.git;a=tree;f=blur-query/src/main/java/org/apache/blur/analysis/type;h=44ca6e1114210ffd8d202a29a347f7b77e37142f;hb=4ebe74ef2e489d8a360220e0d2752c682042ab22 > >>>> > >>>> The main classes to start looking at are BaseFileManager and the > >>>> FieldTypeDefinition. They will lead you to several implementations. > >> My > >>>> hope is that this API will allow us to support the given types in > >> Lucene > >>> as > >>>> well as allowing other to create new FieldTypeDefinition(s) and extend > >>>> Blur. > >>>> > >>>> Let me know what you think. Thanks! > >>>> > >>>> Aaron > >>>> > >>> > >> >
