I'm not sure I understand how they'd be out of order in the iterator if they aren't out of order in the underlying source. How would your iterator return: ((p0, d, (r15, prop_b)), "just testing"), ((p0, d, (r8, prop_b)), "hello, world")
when the underlying data is: p0 | d | (r8, prop_b) | hello, world p0 | d | (r15, prop_b) | just testing ? Why would it reorder the existing entries? -- Christopher L Tubbs II http://gravatar.com/ctubbsii On Thu, Apr 2, 2015 at 1:52 PM, Russ Weeks <[email protected]> wrote: > Thanks for your response, Christopher. > > Yes, I see what you mean by promoting the CQ to the CF. I thought that > would simplify things but if not I could definitely return (k,v) pairs like, > > ((p0, d, (r15, prop_b)), "just testing"), > ((p0, d, (r8, prop_b)), "hello, world") > > (leaving the timestamps and visibilities out for clarity, and assuming that > r15 and r8 are encoded such that their lexical order matches their numeric > order) > > Leaving the existing schema intact is not a problem for me but it doesn't > get around the fact that the (k,v) pairs would be returned out-of-order by > the iterator. I guess another option would be to embed the "data" kv pairs > into the "index" kv pairs somehow, such as: > > ((p0, i, ((prop_a, 7, r15), prop_b)), "just testing") > ((p0, i, ((prop_a, 8, r8), prop_b)), "hello, world") > > I'm not too keen on that solution but if you're telling me that I shouldn't > rely on scan-time iterators being able to emit data out of order... I think > it's the least bad option. > > Regards, > -Russ > > On Thu, Apr 2, 2015 at 12:04 AM, Christopher <[email protected]> wrote: > >> So in your example, you actually did return then in order (lexically, not >> numerically), but I grok the idea that they might not be. >> >> The problem is that your transformation promotes a portion of the cq to the >> cf. That's fine if what your iterator is returning includes only that from >> a single cf (day, the 'data' cf). But otherwise, you could get duplicates >> or out of order results, which can mess up the client's expectations when >> retrieving batches from the servers. It could work in some limited cases, >> but I'd avoid it. >> >> Instead, why not preserve order by preserving the existing schema, and just >> ignore the unused cf in the client? >> >> On Thu, Apr 2, 2015, 00:28 Russ Weeks <[email protected]> wrote: >> >> > Thanks, Christopher. It's nice to hear an unambiguous point of view :) >> > >> > Do you see any alternative way of implementing a range scan on a >> > partitioned index? The problem does not exist for exact-match scans >> because >> > the row ID in the index entry CQ provides the correct ordering. >> > >> > Thanks, >> > -Russ >> > >> > On Wed, Apr 1, 2015 at 9:11 PM, Christopher <[email protected]> wrote: >> > >> > > You should definitely not rely on this behavior. It goes against best >> > > practices and is prone to error. It is not recommended. >> > > >> > > On Wed, Apr 1, 2015, 20:03 Russ Weeks <[email protected]> >> wrote: >> > > >> > > > A wonderful property of scan-time iterators is that they can emit row >> > IDs >> > > > in arbitrary order. Before I go off and build an index that relies on >> > > this >> > > > behaviour, I'd like to get a sense of how likely it is to exist in >> > future >> > > > versions of Accumulo. >> > > > >> > > > I'd like to build an index like this (hopefully the ascii comes >> > through, >> > > if >> > > > not check here < >> https://gist.github.com/anonymous/1a64114da4b68a2ec822 >> > > >): >> > > > >> > > > >> > > > row | cf | cq | val >> > > > ------------------------------------------------- >> > > > p0 | i | (prop_a, 7, r15) | 1 >> > > > p0 | i | (prop_a, 8, r8) | 1 >> > > > p0 | i | (prop_a, 9, r19) | 1 >> > > > [...snip...] >> > > > p0 | d | (r8, prop_a) | 8 >> > > > p0 | d | (r8, prop_b) | hello, world >> > > > p0 | d | (r15, prop_a) | 7 >> > > > p0 | d | (r15, prop_b) | just testing >> > > > p0 | d | (r19, prop_a) | 9 >> > > > p0 | d | (r19, prop_b) | something else >> > > > >> > > > Which is a pretty conventional partitioned index. I'd like to be able >> > to >> > > > issue a query like, "Tell me about prop_b for all documents where >> > prop_a >> > > < >> > > > 9" but I'm pretty sure that the only way this could work at scale is >> if >> > > > it's OK for the iterator to return (p0, r15, prop_b, "just testing") >> > > > followed by (p0, r8, prop_b, "hello, world"). >> > > > >> > > > This works today - if you folks see any flaws in my reasoning please >> > let >> > > me >> > > > know - my question is, do you see this as functionality that should >> be >> > > > preserved in the future? >> > > > >> > > > Thanks, >> > > > -Russ >> > > > >> > > >> > >>
