Github user ramkrish86 commented on a diff in the pull request:
https://github.com/apache/incubator-phoenix/pull/8#discussion_r9956836
--- Diff:
phoenix-core/src/main/java/org/apache/phoenix/expression/ArrayConstructorExpression.java
---
@@ -62,27 +63,54 @@ public void reset() {
position = 0;
Arrays.fill(elements, null);
}
-
+
@Override
public boolean evaluate(Tuple tuple, ImmutableBytesWritable ptr) {
- for (int i = position >= 0 ? position : 0; i < elements.length;
i++) {
- Expression child = children.get(i);
- if (!child.evaluate(tuple, ptr)) {
- if (tuple != null && !tuple.isImmutable()) {
- if (position >= 0) position = i;
- return false;
+ try {
+ int offset = 0;
+ // track the elementlength for variable array
+ int noOfElements = children.size();
+ int elementLength = 0;
+ byteStream = new TrustedByteArrayOutputStream(estimatedSize);
+ oStream = new DataOutputStream(byteStream);
+ for (int i = position >= 0 ? position : 0; i <
elements.length; i++) {
+ Expression child = children.get(i);
+ if (!child.evaluate(tuple, ptr)) {
+ if (tuple != null && !tuple.isImmutable()) {
+ if (position >= 0) position = i;
+ return false;
+ }
+ } else {
+ // track the offset position here from the size of the
byteStream
+ if (!baseType.isFixedWidth()) {
+ offset = byteStream.size();
+ offsetPos[i] = offset;
--- End diff --
Take this case
abc, null, bcd, null, null, b
The offset for this would be as per the above logic where we get the offset
both for nulls and non nulls
0 4 4 10 10 10
Now while deserialization i know there are 6 elements and always we need to
compare successive two elements to know the length
For the first element it would do 4 - 0 = 4
The next element would mean null 4 -4 = 0 (fine no problem)
For the next element it is now 10 -4 = 6 (but this is wrong) because we
already have the seperator byte added and the null value counter along with a
seperator. So we need to create a logic to manipulate this. I have done that.
Not a problem. So i would track nulls here and add two bytes to the currOff
while reading the next element.
The same happens for the last element, because we cannot compare last with
anyother element we would know its offset and if there was a null previously
then we need to adjust its offset to read the exact element.
Now take a case where there are trailing nulls
abc, null, bcd, null, ced, null. The offset array would be
0 4 4 10 10 16
In this case
For the first element 4 - 0 = 0
second element = 4 - 4 = 0 (so null)
3rd element = 10 - 4 = 6 (but i have applied logic to skip the seperator
byte) so i am able to read this
4th element = 10 -10 = 0
5th element = 16 - 10 = 6 (adjust offset)
6th element is actually a null. But how do i know that? Because we have
only the last elements offset in hand with us. And using that we cannot infer
the presence of null. Am i missing something here.
I tried changing the logic of how we add the offset but that is again not
easy in deserialization. That is why I thought better to write the number of
trailing nulls.
The logic to deal with byte buffer should actually know how many nulls are
there, in that how many are repeating for us to fix the exact bytebuffer size.
That again needs some tweak, but i have used this logic to find out the
trailing nulls.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
[email protected] or file a JIRA ticket with INFRA.
---