The type system is probably the largest change to Blur since we moved to Lucene 4. So I know it's a bit of a moving target. Any contribution is significant, just talking about the tasks on jira is very helpful. So if you see any tasks that catch your eye in jira, I am more than happy to help you work on them. Thanks again!
Aaron On Thu, Jul 18, 2013 at 7:33 PM, rahul challapalli < [email protected]> wrote: > Hi Aaron, > > Thanks for the clarification. As for me I really haven't made any progress > and the direction I worked was to store the analyzer definition in > zookeeper in a different way and some minor changes to the thrift interface > (type, stricttypes) and generated the code. I am not able to find time to > contribute anything significant. If there is anything specific you want me > to take up, do let me know. > > - Rahul > > > On Wed, Jul 17, 2013 at 5:00 PM, Aaron McCurry <[email protected]> wrote: > > > Rahul, > > > > After giving it some thought I think we should store all the meta data > > about tables in hdfs. Let explain why. We have run into issues where my > > project wants to remove a table from blur but not delete the indexes > (maybe > > because it's a test system with multiple versions of the same data). > > However the problem is you can't just import the table back into blur > > because the column definitions are stored in zookeeper and they have > > already been destroyed. > > > > That's why I have implemented a hdfs field manager that stores the meta > > data in hdfs. I don't really have a feel yet how well this will work but > > the basic way the base field manager is implemented is it storage > agnostic. > > So any implementing sub class has to implement a way to store and load > > column definitions. > > > > Further because the model we are implementing is write a col def once per > > field and never modify. I think this will fit well within hdfs's > > capabilities. Because hdfs enforces atomic file creation so no 2 nodes > can > > create the same column definition at least with the way I have > implemented > > it. > > > > Take a look at what's there and let me know what you think. Thanks! > > > > Aaron > > > > Sent from my iPad > > > > On Jul 17, 2013, at 11:53 AM, rahul challapalli < > > [email protected]> wrote: > > > > > Hi Aaron, > > > > > > Can you elaborate on your thoughts about how to store the Analyzer > > > Definition in zookeeper? > > > > > > Below example is from my notes in the past. Let me know what you think > > > > > > /blur/default/tables/words/default-column-definition : value > > > > > > > > > /blur/default/tables/words/column-families/fam1/default-column-definition : > > > value > > > > > > /blur/default/tables/words/column-families/fam1/col1 : value > > > > > > /blur/default/tables/words/column-families/fam1/col2 : value > > > > > > > > > - Rahul > > > > > > > > > On Tue, Jul 16, 2013 at 6:06 PM, Aaron McCurry <[email protected]> > > wrote: > > > > > >> On Tue, Jul 16, 2013 at 1:24 AM, rahul challapalli < > > >> [email protected]> wrote: > > >> > > >>> Hi Aaron, > > >>> > > >>> I started looking into the functionality you already added. A few > > >>> observations : > > >>> > > >>> In the Blur.thrift file, AnalyzerDefinition is removed from the > > >>> TableDescriptor. Was this intentional? If so can you give us an > example > > >> of > > >>> how to use them? > > >>> > > >> > > >> Removing the AnalyzerDefinition was intentional. The motivation there > > is > > >> to allow the schema (Families,Columns,and Types) to be set/added > > >> independently of the creation of the table. I have not created any > new > > >> thrift rpc calls to add new column definitions but ultimately it will > > >> call addColumnDefinition > > >> on the FieldManager class. > > >> > > >> > > > https://git-wip-us.apache.org/repos/asf?p=incubator-blur.git;a=blob;f=blur- > > >> > > >> > > > query/src/main/java/org/apache/blur/analysis/FieldManager.java;h=2271726e55bb9356ca6f2b6edf7a5fdec46b36c4;hb=ae516a442767b31d2c7e29b07a78aa08ec246dcf< > > > https://git-wip-us.apache.org/repos/asf?p=incubator-blur.git;a=blob;f=blur- > > >> > > > query/src/main/java/org/apache/blur/analysis/FieldManager.java;h=2271726e55bb9356ca6f2b6edf7a5fdec46b36c4;hb=ae516a442767b31d2c7e29b07a78aa08ec246dcf> > > >> > > >> > > >>> I modified the Blur.thrift(Column and TableDescriptor) and generated > > the > > >>> code. I don't know how to handle scenarios where minor changes are > made > > >> and > > >>> need to be pushed into the branch. Otherwise it becomes a big commit > if > > >> we > > >>> try to associate with a specific JIRA ticket? > > >>> > > >> > > >> I think that you should attach a patch to the jira ticket. I can > review > > >> and merge then we can work from the same baseline. Then we can repeat > > that > > >> process as many times as needed. > > >> > > >> > > >>> > > >>> I added a bunch of code to the MutationHelper class to validate > > in-bound > > >>> columns. Can you check whether my understanding is aligned with the > > >>> requirement? > > >> > > >> > > >>> public static Column validateColumn(String family, Column col, > > >>> booleanstrict, FieldManager fieldManager) { > > >>> > > >>> if (strict == true) { > > >>> > > >>> if (col.type == null) { > > >>> > > >>> throw new RuntimeException("The type of the column is a required > > >> field > > >>> for this table. To turn off this behavior set strictTypes=false on > the > > >>> TableDesciptor"); > > >>> > > >>> } > > >>> > > >>> } > > >>> > > >>> > > >>> > > >>> FieldTypeDefinition fieldTypeDefinition = > > >>> fieldManager.getFieldTypeDefinition(family + "." + col.name); > > >>> > > >>> if (fieldTypeDefinition == null) { > > >>> > > >>> // TODO dynamic column : add new column definition > > >>> > > >>> return col; > > >>> > > >>> } > > >>> > > >>> if (!fieldTypeDefinition.getName().equalsIgnoreCase(col.type)) { > > >>> > > >>> throw new RuntimeException("The type defined in the column does not > > >> match > > >>> the existing type definition"); > > >>> > > >>> } > > >>> > > >>> return col; > > >>> > > >>> } > > >>> > > >> > > >> Yes this looks good, but just an FYI I like to always throw > > BlurExceptions > > >> instead of RuntimeExceptions. The main reason for this (across the > > board) > > >> is that Thrift will wrap all exceptions that are not BlurExceptions or > > >> TExceptions in a TException. When this happens that client thinks > that > > >> something went wrong with the connection and will retry the call over > > >> several times. > > >> > > >> Thanks! > > >> > > >> Aaron > > >> > > >>> > > >>> > > >>> - Rahul > > >>> > > >>> > > >>> On Tue, Jul 2, 2013 at 4:27 PM, Aaron McCurry <[email protected]> > > >> wrote: > > >>> > > >>>> I have created a new branch where I have been working on rewriting > the > > >>>> type/analyzer system for what seems like the 3rd or 4th time. So > > >>> hopefully > > >>>> it will turn out better this time. > > >>>> > > >>>> > > >>>> > > >>> > > >> > > > https://git-wip-us.apache.org/repos/asf?p=incubator-blur.git;a=shortlog;h=refs/heads/0.2.0-newtypesystem > > >>>> > > >>>> If you have a chance I would love some feedback on what's been built > > >> thus > > >>>> far. > > >>>> > > >>>> > > >>>> The o.a.b.analysis package in the blur-query project: > > >>>> > > >>>> > > >>>> > > >>> > > >> > > > https://git-wip-us.apache.org/repos/asf?p=incubator-blur.git;a=tree;f=blur-query/src/main/java/org/apache/blur/analysis;h=3db57e994d4e60cc81d94641482c69305767fab5;hb=4ebe74ef2e489d8a360220e0d2752c682042ab22 > > >>>> > > >>>> And the o.a.b.analysis.type package in the blur-query project: > > >>>> > > >>>> > > >>>> > > >>> > > >> > > > https://git-wip-us.apache.org/repos/asf?p=incubator-blur.git;a=tree;f=blur-query/src/main/java/org/apache/blur/analysis/type;h=44ca6e1114210ffd8d202a29a347f7b77e37142f;hb=4ebe74ef2e489d8a360220e0d2752c682042ab22 > > >>>> > > >>>> The main classes to start looking at are BaseFileManager and the > > >>>> FieldTypeDefinition. They will lead you to several implementations. > > >> My > > >>>> hope is that this API will allow us to support the given types in > > >> Lucene > > >>> as > > >>>> well as allowing other to create new FieldTypeDefinition(s) and > extend > > >>>> Blur. > > >>>> > > >>>> Let me know what you think. Thanks! > > >>>> > > >>>> Aaron > > >>>> > > >>> > > >> > > >
