Hi, I’m back from vacation and finalize offset test.
I found that memory is not necessarily 64-bytes align. So now the offset is calculated before ArrowBuf initialization. https://github.com/apache/arrow/pull/98 <https://github.com/apache/arrow/pull/98> -Kiril > On Jun 16, 2016, at 21:32, Kiril Menshikov <kmenshi...@gmail.com> wrote: > > Yes, I’d rather write the test. > >> On Jun 16, 2016, at 18:59, Jacques Nadeau <jacq...@apache.org> wrote: >> >>> Netty buffer always allocate memory aligned to 64-bytes. So each new >>> ArrowBuf will be aligned to 64-bytes as well, with offset = 0. >>> >> >> You confirmed that both the Netty chunk as well as buffer allocations >> (ArrowBufs returned from here [1]) are on 64-byte offsets? Can you maybe >> write some tests/add some assertion to the code so we protect against that >> changing? >> >> >>> >>> I don't fully understand why new allocations should be on 64-bytes >>> offset? >>> >> >> As part of the Arrow spec, each separate piece of memory must have 64 >> byte-sized-word alignment and 64 byte padding. For example, if you have >> NullableVarChar, you'll need three buffers: nullable bits, four byte >> offsets and data buffer. Each of those must be on a 64 byte offset and be a >> length that is a multiple of 64 bytes. >> >> [1] >> https://github.com/apache/arrow/blob/master/java/memory/src/main/java/org/apache/arrow/memory/BufferAllocator.java#L37 >> >> >>> >>> -Kiril >>> >>> >>> >>> On Jun 14, 2016, at 00:22, Jacques Nadeau <jacq...@apache.org> wrote: >>> >>> Yes, I think there are two main components. Also, I accidentally said 64 >>> bits when I should have said 64 bytes. >>> >>> 1. New allocations should be on 64 byte offsets >>> 2. Serializing existing vectors must be done such that they are always in >>> an increment of 64 bytes. This is necessary to avoid copying when sending >>> across the wire, otherwise the receiver would need to slice up/copy the >>> incoming datastream. This would be done by ensuring that the setValueCount >>> and similar operations (capcity) are done at the right range. I'd expect >>> this second one to be best done on top of Steven's work. >>> >>> >>> >>> >>> >>> On Mon, Jun 13, 2016 at 2:14 PM, Kiril Menshikov <kmenshi...@gmail.com> >>> wrote: >>> >>> Hi, >>> >>> Does this mean that offset must be adjusted depending on the UDLE memory. >>> So new memory address will be align to 64 bits? >>> >>> >>> The first thing we should do for the alignment in Java is adjust the >>> allocator so that it always allocates on a 64 bit offset. Does someone >>> >>> want >>> >>> to look at that? >>> >>> >>> >>> Thanks, >>> -Kiril >>> >>> On Jun 11, 2016, at 22:45, Jacques Nadeau <jacq...@apache.org> wrote: >>> >>> Steven is on vacation for a couple of days. His focus as I understand it >>> >>> is >>> >>> rationalizing the code so it is cleaner, correct for arrow versus drill >>> representation differences (such as decimal, nulls, etc) and has more >>> >>> unit >>> >>> tests. Once he gets back in the next day or two, hopefully he can post a >>> wip patch. >>> >>> The first thing we should do for the alignment in Java is adjust the >>> allocator so that it always allocates on a 64 bit offset. Does someone >>> >>> want >>> >>> to look at that? >>> On Jun 10, 2016 5:35 PM, "Gaurav Agarwal" <gaurav130...@gmail.com> >>> >>> wrote: >>> >>> >>> I am also interested on this . Do we need to know drill before start >>> implementing not a for arrow . >>> On Jun 10, 2016 9:45 PM, "Wail Alkowaileet" <wael....@gmail.com> wrote: >>> >>> On Wed, Jun 8, 2016 at 9:26 PM, Micah Kornfield <emkornfi...@gmail.com >>> >>> >>> wrote: >>> >>> Hi Steven, >>> Is the patch focused on the alignment/padding. Or are there other >>> issues as well? >>> >>> >>> I'm interested on this as well.... >>> >>> >>> Thanks, >>> Micah >>> >>> On Tue, Jun 7, 2016 at 11:22 PM, Steven Phillips <ste...@dremio.com> >>> wrote: >>> >>> I am currently working on a patch that addresses this, as well as >>> >>> removing >>> >>> some of the residual code from Drill that isn't really needed in >>> >>> Arrow, >>> >>> (such as the Drill types, MaterializedField, etc.) >>> >>> I will be posting this within a few days. >>> >>> On Tue, Jun 7, 2016 at 5:54 PM, Leif Walsh <leif.wa...@gmail.com> >>> >>> wrote: >>> >>> >>> I am also interested in this. >>> On Tue, Jun 7, 2016 at 17:37 Holden Karau <hol...@pigscanfly.ca> >>> >>> wrote: >>> >>> >>> Hi Everyone, >>> >>> I'm looking to help get started with Arrow & Spark and to that end >>> >>> I'd >>> >>> like >>> >>> to start with getting the Java implementation closer to the spec >>> >>> / C >>> >>> implementation. I'm wondering what places people know the >>> >>> differences >>> >>> are >>> >>> between the two? >>> >>> Cheers, >>> >>> Holden :) >>> >>> -- >>> -- >>> Cheers, >>> Leif >>> >>> >>> >>> >>> >>> -- >>> >>> *Regards,* >>> Wail Alkowaileet >>> >