[ 
https://issues.apache.org/jira/browse/PHOENIX-5290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16887473#comment-16887473
 ] 

James Taylor commented on PHOENIX-5290:
---------------------------------------

First, a big THANK YOU, [~lhofhansl] for spearheading this effort to get the 
test runs passing again. You've done an amazing job!
{quote}Do multiple expressions cause a separator byte *each at the end*?!
{quote}
Yes, potentially. The reason is that you don't know as you're traversing when 
you encounter a null/empty value if it'll be at the end - there might be more 
expressions afterwards that contain values (in which case you'd keep the null 
values).

For example, if you have four varchar expressions of `a`,`b`,`\0`,`c', you'd 
end up keeping the `\0` byte. If you have `a`,`b`,`\0`,`\0`, you'd end up 
trimming the last two (and note I've left off the trailing `\0` you'd have 
after each value above as well).

[~dbwong] - it'd be good if there was more abstraction around this on the 
"creating the bytes" side of things - there's too much duplication of this 
similar code. The abstraction on the read side is a little bit better with the 
RowKeySchemaAccessor class.
{quote}We append separators somewhat blindly after writing each field when 
generating the row key.  Since row keys have length it is unnecessary to store 
the trailing key so we trim it.  If we were to have multiple variable length 
fields, (or nullable secondary indexes) we could in theory trim those as well.
{quote}
There are very specific and crucial reasons for generating the row keys as we 
do and I wouldn't recommend changing that (both for the b/w compat nightmare 
it'd cause and so as not to introduce a correctness issue). Some thorough 
documentation would be good, though. The binary sorting of rows must match the 
natural sort order of the rows (which is hard to get right, especially with 
descending row keys thrown in the mix). Row keys do not store the length - we 
rely on either the fixed length width (based on the type), or the separator 
byte. Trailing null bytes must be trimmed in order for queries on preceding 
columns to work out correct. This comes into play especially when you add a new 
column to the row key (and not requiring existing rows to be changed).

> HashJoinMoreIT is flapping
> --------------------------
>
>                 Key: PHOENIX-5290
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5290
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.15.0, 4.14.1, 5.1.0
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>            Priority: Major
>             Fix For: 4.15.0, 5.1.0
>
>         Attachments: 5290-combined.txt, 5290-failure.txt, 5290-v2.txt, 
> 5290-v3.txt, 5290.txt
>
>
> {code}
> [INFO] Running org.apache.phoenix.end2end.join.HashJoinMoreIT
> [ERROR] Tests run: 8, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 91.509 s <<< FAILURE! - in org.apache.phoenix.end2end.join.HashJoinMoreIT
> [ERROR] testBug2961(org.apache.phoenix.end2end.join.HashJoinMoreIT)  Time 
> elapsed: 2.42 s  <<< ERROR!
> java.lang.IllegalArgumentException: 6 > 5
>         at 
> org.apache.phoenix.end2end.join.HashJoinMoreIT.testBug2961(HashJoinMoreIT.java:898)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to