Hi Igor,

You are right that Drill vectors are also subject to fragmentation (both 
internal and external.) We want to fix that bug. So, it does not help us to 
move from one implementation that suffers from fragmentation to another one 
that also suffers from fragmentation.

In another e-mail you asked if we could create an operator that reads both 
Drill and Arrow vectors. The short answer is "yes." It seems that Presto can do 
this already (with its own memory layout options; AFAIK, Presto does not 
support Arrow.)

The column readers and writers are defined as interfaces. We have Drill 
implementations. We could create Arrow implementations. Then, code should be 
able to use either implementation transparently -- as long as the operator 
either passes vectors through unchanged, or completely rewrites vectors.

The real challenge is that most operators both copy existing vectors to output 
and create new vectors. Consider Project. If passes some vectors directly from 
input to output. And, if the query contains an expression, Project reads the 
input and writes a new output vector. In this case, things get complex. If the 
input is Arrow, but the output is Drill, we'd have to copy all data. Same if 
the input is drill but the output is Arrow. Maybe we could design Project so it 
creates the same kind of output vectors as input?

Even more complex: exchange operators would have to handle both kinds of 
vectors. If the receiving operator an only handle Drill vectors, say, then 
Exchange would have to reformat the source vectors (either Drill or Arrow 
depending on the upstream operator) to match the needs of the downstream 
operator.

All this sounds rather complex. We're still fixing bugs in the Drill 
implementation; now we'd have to fix bugs in the Arrow version and all the 
possible combinations. So, better to pick one solution and figure out a minimal 
cost migration plan to that solution rather than trying to support multiple 
solutions.


One thing to keep in mind about both Drill and Arrow vectors is that they are 
designed to use the same memory layout for compute as for network transfers. 
That's why the buffers are in direct memory: so Netty can copy from those 
buffers to the network interface, and from the network interface into 
direct-memory buffers.

It seems that Presto serializes rows to JSON(!) for RPC. Maybe also Thrift? 
This makes it easy to translate between memory formats: no single format has 
the special privilege of being the wire protocol. Of course it is thus likely 
that data exchange is less efficient than Drill or Arrow. (Anyone have more 
knowledge of this area?) As another example, Impala uses rows. Impala is in C++ 
and so gains the advantage of direct memory transfers to/from the network (at 
the cost of having to maintain its own memory allocation schema.)

The simplest memory management, of course, is to use on-heap byte arrays. No 
fragmentation issues because GC will relocate blocks to avoid fragmentation. 
But, it seems that Netty must copy data to/from heap from/to direct memory to 
send over the network. so we pay that extra data copy cost. I wonder if there 
is anything new in this area that would help us.

Finally, you raise a very good point about Arrow adoption. This is why we, as a 
project, need to figure out our goals. That will tell us if we should develop 
the optimal data representation for Drill, or if we are better off following 
what someone else has done.

There is no magic answer; it is just plain old technical design and tradeoffs. 
Very glad to see us engaged in that process.


Thanks,
- Paul

 

    On Friday, January 10, 2020, 01:48:25 PM PST, Igor Guzenko 
<ihor.huzenko....@gmail.com> wrote:  
 
 Hi Paul,

I would like to add that from your wiki seems that Drill Vectors also has
the same fragmentation issues described as the first problem. So I don't
think that it can be a reason to abandon Arrow completely now.

About the second problem, I agree that this might be a big issue. But it
seems that other open-source tools are not widely adopted Arrow and seems
like the high-performant
native readers are implemented by Arrow backers as proprietary software. So
most probably we won't have such problems because we will use own readers
for most storage plugins.

Although it will not matter at all if, as a result, we choose our own path
of development.

Thanks,
Igor



On Fri, Jan 10, 2020 at 9:17 PM Paul Rogers <par0...@yahoo.com.invalid>
wrote:

> Hi All,
>
> Glad to see the Arrow discussion heating up and that it is causing us to
> ask deeper questions.
>
> Here I want to get a bit techie on everyone and highlight two potential
> memory management problems with Arrow.
>
> First: memory fragmentation. Recall that this is how we started on the EVF
> path. Allow allocates large, variable-size blocks of memory. To quote a
> 35-year old DB paper [1]: "[V]ariable-sized pages would cause heavy
> fragmentation problems."
>
> Second: the idea of Arrow is that tool A creates a set of vectors that
> tool B will consume. This means that tool A and B have to agree on vector
> (buffer) size. Suppose tool A wants really big batches, but B can handle
> only small batches. In a columnar system, there is no good way to split a
> bit batch into smaller ones. One can copy values. but this is exactly what
> Arrow is supposed to avoid.
>
> Hence, when using Arrow, a data producer dictates to Drill a crucial
> factor in memory management: batch size. And, Drill dictates batch size to
> its clients. It will require complex negotiation logic. All to avoid a copy
> when the tools will communicate via RPC anyway. This is, in the larger
> picture, not a very good design at all. Needless to say, I am personally
> very skeptical of the benefits.
>
> A possible better alternative, one that we prototyped some time back, is
> to base Drill memory on fixed-size "blocks", say 1 MB in size. Any given
> vector can use part of, all of, or multiple of the blocks to store data.
> The blocks are at least as large as the CPU cache lines, so we retain that
> benefit. Memory management is now far easier, and we can exploit 40 years
> of experience in effective buffer management. (Plus, the blocks are easy to
> spill to disk using classic RDBMS algorithms.)
>
> Point is: let's not blindly accept the work that Arrow has done. Let's do
> our homework to figure out the best system for Drill: whether that be
> Arrow, fixed-size buffers, the current vectors, or something else entirely.
>
> Thanks,
> - Paul
>
>
>
> [1]
> http://users.informatik.uni-halle.de/~hinnebur/Lehre/2008_db_iib_web/uebung3_p560-effelsberg.pdf
  

Reply via email to