[
https://issues.apache.org/jira/browse/PHOENIX-5290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16887473#comment-16887473
]
James Taylor commented on PHOENIX-5290:
---------------------------------------
First, a big THANK YOU, [~lhofhansl] for spearheading this effort to get the
test runs passing again. You've done an amazing job!
{quote}Do multiple expressions cause a separator byte *each at the end*?!
{quote}
Yes, potentially. The reason is that you don't know as you're traversing when
you encounter a null/empty value if it'll be at the end - there might be more
expressions afterwards that contain values (in which case you'd keep the null
values).
For example, if you have four varchar expressions of `a`,`b`,`\0`,`c', you'd
end up keeping the `\0` byte. If you have `a`,`b`,`\0`,`\0`, you'd end up
trimming the last two (and note I've left off the trailing `\0` you'd have
after each value above as well).
[~dbwong] - it'd be good if there was more abstraction around this on the
"creating the bytes" side of things - there's too much duplication of this
similar code. The abstraction on the read side is a little bit better with the
RowKeySchemaAccessor class.
{quote}We append separators somewhat blindly after writing each field when
generating the row key. Since row keys have length it is unnecessary to store
the trailing key so we trim it. If we were to have multiple variable length
fields, (or nullable secondary indexes) we could in theory trim those as well.
{quote}
There are very specific and crucial reasons for generating the row keys as we
do and I wouldn't recommend changing that (both for the b/w compat nightmare
it'd cause and so as not to introduce a correctness issue). Some thorough
documentation would be good, though. The binary sorting of rows must match the
natural sort order of the rows (which is hard to get right, especially with
descending row keys thrown in the mix). Row keys do not store the length - we
rely on either the fixed length width (based on the type), or the separator
byte. Trailing null bytes must be trimmed in order for queries on preceding
columns to work out correct. This comes into play especially when you add a new
column to the row key (and not requiring existing rows to be changed).
> HashJoinMoreIT is flapping
> --------------------------
>
> Key: PHOENIX-5290
> URL: https://issues.apache.org/jira/browse/PHOENIX-5290
> Project: Phoenix
> Issue Type: Bug
> Affects Versions: 4.15.0, 4.14.1, 5.1.0
> Reporter: Lars Hofhansl
> Assignee: Lars Hofhansl
> Priority: Major
> Fix For: 4.15.0, 5.1.0
>
> Attachments: 5290-combined.txt, 5290-failure.txt, 5290-v2.txt,
> 5290-v3.txt, 5290.txt
>
>
> {code}
> [INFO] Running org.apache.phoenix.end2end.join.HashJoinMoreIT
> [ERROR] Tests run: 8, Failures: 0, Errors: 1, Skipped: 0, Time elapsed:
> 91.509 s <<< FAILURE! - in org.apache.phoenix.end2end.join.HashJoinMoreIT
> [ERROR] testBug2961(org.apache.phoenix.end2end.join.HashJoinMoreIT) Time
> elapsed: 2.42 s <<< ERROR!
> java.lang.IllegalArgumentException: 6 > 5
> at
> org.apache.phoenix.end2end.join.HashJoinMoreIT.testBug2961(HashJoinMoreIT.java:898)
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)