I'm going to put some code together today so that we can take a look at different issues and see what they look like.
Aaron On Mon, Oct 14, 2013 at 2:51 AM, Otis Gospodnetic < [email protected]> wrote: > Hi, > > I missed emails/issues where this functionality is described, so I'm > commenting only on naming, trying to point out possible confusion with > other search projects. "Collection" in Solr has a specific meaning - > it's a logical index in a Solr(Cloud) cluster. So maybe that term > could be avoided here, too. > > Otis > -- > Solr & ElasticSearch Support -- http://sematext.com/ > Performance Monitoring -- http://sematext.com/spm > > > > On Sat, Oct 12, 2013 at 2:45 PM, Aaron McCurry <[email protected]> wrote: > > Perhaps, but the interesting thing is that I think that grouping > > functionality is actually very similar. It's just a static structure > > instead of being dynamic. At least if I understand the solr feature > > correctly. > > > > Maybe we should call it a DocumentCollection. Since it's a collection of > > documents. > > > > Aaron > > > > > > On Tue, Oct 1, 2013 at 10:04 PM, Otis Gospodnetic < > > [email protected]> wrote: > > > >> Hi, > >> > >> Note that Solr and Lucene both have grouping functionality, which some > >> people may confuse with DocGroups you are talking about here. > >> > >> Otis > >> -- > >> Solr & ElasticSearch Support -- http://sematext.com/ > >> Performance Monitoring -- http://sematext.com/spm > >> > >> > >> > >> On Mon, Sep 30, 2013 at 1:09 PM, Aaron McCurry <[email protected]> > wrote: > >> > While I don't really like the idea of changing all the code to rename > Row > >> > and Record, I think it is necessary to help people who are new to Blur > >> > transition from Lucene (or any other document store for that matter). > >> > > >> > I think that having Doc and DocGroup both be first class objects is > also > >> > critical. I think that for most implementations DocGroup is over kill > >> and > >> > Document is the only thing needed. I have some ideas on how to make > this > >> > possible in the API. > >> > > >> > Here's and example of what we could do, this is raw thrift which can > be > >> > ugly but with some helper/utility classes it can be made better: > >> > > >> > Doc doc = new Doc(); > >> > doc.setDocId(new Value(_Fields.LONG_VAL, 1234L)); > >> > doc.addToFields(new Field("int_fieldname", new Value(_Fields.INT_VAL, > >> > 1234))); > >> > doc.addToFields(new Field("string_fieldname", new > >> Value(_Fields.STRING_VAL, > >> > "value1"))); > >> > doc.addToFields(new Field("text_fieldname", new > Value(_Fields.TEXT_VAL, > >> > "this is full text indexed."))); > >> > > >> > > >> > DocGroup docGroup = new DocGroup(); > >> > docGroup.setDocGroupId(new Value(_Fields.STRING_VAL, "groupid12345")); > >> > docGroup.addToDocs(doc); > >> > > >> > At this point I think I would like to keep the docId and docGroupId. > I > >> > know that Lucene itself doesn't require it but if we don't have them > >> > deletes/updates become a lot more expensive. They would have to > >> broadcast > >> > the delete to all the shards of a table which would kill NRT updates. > >> > > >> > Thoughts? > >> > > >> > Aaron > >> > > >> > > >> > > >> > On Mon, Sep 30, 2013 at 12:49 PM, Garrett Barton > >> > <[email protected]>wrote: > >> > > >> >> +1 here. > >> >> > >> >> I also agree with Colton about making docgroup/row optional. I know > in > >> the > >> >> current design its not easy but I remember Aaron saying in the > branch it > >> >> might be possible to specify any column as the I'd making me think it > >> might > >> >> be possible to not have one at all. > >> >> On Sep 30, 2013 10:41 AM, "Colton McInroy" <[email protected]> > >> wrote: > >> >> > >> >> > I personally think that the Row/Record/Column model makes sense. If > >> you > >> >> > have some documentation on the site saying here are the Lucene > >> >> equivalents > >> >> > to Blur it would probably avoid having those types of questions in > the > >> >> > future. If you have an explanation of this, you could leave the > model > >> the > >> >> > same to avoid having to make a bunch of changes and cause chaos. > >> >> > > >> >> > Glad the Family attribute is being dropped, I kinda came in at the > >> end of > >> >> > it's lifespan I guess, because it doesn't really make much sense to > >> me. > >> >> How > >> >> > long till it's actually dropped from the code though? > >> >> > > >> >> > One thing I would like to see is Row be an option. In my current > >> >> > implementation of Lucene code I don't use them at all, because > what I > >> am > >> >> > working with makes no sense to have rows really. I also don't > recall > >> >> > DocGroups being required in Lucene, and I never worked with them, > so > >> that > >> >> > kinda threw me off when I ran into it. > >> >> > > >> >> > Thanks, > >> >> > Colton McInroy > >> >> > > >> >> > * Director of Security Engineering > >> >> > > >> >> > > >> >> > Phone > >> >> > (Toll Free) > >> >> > _US_ (888)-818-1344 Press 2 > >> >> > _UK_ 0-800-635-0551 Press 2 > >> >> > > >> >> > My Extension 101 > >> >> > 24/7 Support [email protected] <mailto: > [email protected]> > >> >> > Email [email protected] <mailto:[email protected]> > >> >> > Website http://www.dosarrest.com > >> >> > > >> >> > On 9/30/2013 6:45 AM, Tim Williams wrote: > >> >> > > >> >> >> Hi Devs, > >> >> >> I'm wondering if we should go ahead and endure the [painful] move > to > >> a > >> >> >> more intuitive data model in Blur? Here are some observations: > >> >> >> > >> >> >> 1) New folks coming to Blur have a background in Lucene - not > >> >> >> necessarily a NoSQL data store - and want to know where their > >> >> >> "Documents" are. > >> >> >> > >> >> >> 2) For folks aware of NoSQL stores, the Row/Record model can be > >> >> >> misleading in terms of design tradeoffs. > >> >> >> > >> >> >> 3) The Row/Record model seems to bring a significant explanation > >> burden. > >> >> >> > >> >> >> In the past we've talked about a model that's more aligned with > >> >> >> Lucene's Document's. Aaron did some api work on a branch a while > >> back > >> >> >> and it's come up in an issue again recently. > >> >> >> > >> >> >> So, I'm wondering if now is the time to just endure some shortish > >> >> >> period of pain changing everything over now? The idea being > >> something > >> >> >> like: > >> >> >> > >> >> >> Row -> DocGroup > >> >> >> Record -> Document > >> >> >> Column -> Field > >> >> >> Family -> (dropped) > >> >> >> > >> >> >> I think this will alleviate some confusion and provide a solid > >> >> >> foundation for the long term; enabling a shorter learning curve > and > >> >> >> less confusion. > >> >> >> > >> >> >> Such a big change would be good to get done while we're still a > >> >> >> small-ish community but I think it's important that everyone is on > >> >> >> board - as it will no doubt create lots of short term chaos and > >> >> >> confusion... > >> >> >> > >> >> >> Thoughts? > >> >> >> > >> >> >> Thanks, > >> >> >> --tim > >> >> >> > >> >> > > >> >> > > >> >> > >> >
