Re: Performance of VectorizedRleValuesReader

Chang Chen Mon, 14 Sep 2020 20:52:31 -0700

I See.

In our case,  we use SingleBufferInputStream, so time spent is duplicating
the backing byte buffer.



Thanks
Chang


Ryan Blue <b...@apache.org> 于2020年9月15日周二 上午2:04写道：

> Before, the input was a byte array so we could read from it directly. Now,
> the input is a `ByteBufferInputStream` so that Parquet can choose how to
> allocate buffers. For example, we use vectored reads from S3 that pull back
> multiple buffers in parallel.
>
> Now that the input is a stream based on possibly multiple byte buffers, it
> provides a method to get a buffer of a certain length. In most cases, that
> will create a ByteBuffer with the same backing byte array, but it may need
> to copy if the request spans multiple buffers in the stream. Most of the
> time, the call to `slice` only requires duplicating the buffer and setting
> its limit, but a read that spans multiple buffers is expensive. It would be
> helpful to know whether the time spent is copying data, which would
> indicate the backing buffers are too small, or whether it is spent
> duplicating the backing byte buffer.
>
> On Mon, Sep 14, 2020 at 5:29 AM Sean Owen <sro...@gmail.com> wrote:
>
>> Ryan do you happen to have any opinion there? that particular section
>> was introduced in the Parquet 1.10 update:
>>
>> https://github.com/apache/spark/commit/cac9b1dea1bb44fa42abf77829c05bf93f70cf20
>> It looks like it didn't use to make a ByteBuffer each time, but read from
>> in.
>>
>> On Sun, Sep 13, 2020 at 10:48 PM Chang Chen <baibaic...@gmail.com> wrote:
>> >
>> > I think we can copy all encoded data into a ByteBuffer once, and unpack
>> values in the loop
>> >
>> >  while (valueIndex < this.currentCount) {
>> >     // values are bit packed 8 at a time, so reading bitWidth will
>> always work
>> >     this.packer.unpack8Values(buffer, buffer.position() + valueIndex,
>> this.currentBuffer, valueIndex);
>> >     valueIndex += 8;
>> >   }
>> >
>> > Sean Owen <sro...@gmail.com> 于2020年9月14日周一 上午10:40写道：
>> >>
>> >> It certainly can't be called once - it's reading different data each
>> time.
>> >> There might be a faster way to do it, I don't know. Do you have ideas?
>> >>
>> >> On Sun, Sep 13, 2020 at 9:25 PM Chang Chen <baibaic...@gmail.com>
>> wrote:
>> >> >
>> >> > Hi export
>> >> >
>> >> > it looks like there is a hot spot in
>> VectorizedRleValuesReader#readNextGroup()
>> >> >
>> >> > case PACKED:
>> >> >   int numGroups = header >>> 1;
>> >> >   this.currentCount = numGroups * 8;
>> >> >
>> >> >   if (this.currentBuffer.length < this.currentCount) {
>> >> >     this.currentBuffer = new int[this.currentCount];
>> >> >   }
>> >> >   currentBufferIdx = 0;
>> >> >   int valueIndex = 0;
>> >> >   while (valueIndex < this.currentCount) {
>> >> >     // values are bit packed 8 at a time, so reading bitWidth will
>> always work
>> >> >     ByteBuffer buffer = in.slice(bitWidth);
>> >> >     this.packer.unpack8Values(buffer, buffer.position(),
>> this.currentBuffer, valueIndex);
>> >> >     valueIndex += 8;
>> >> >   }
>> >> >
>> >> >
>> >> > Per my profile, the codes will spend 30% time of readNextGrou() on
>> slice ， why we can't call slice out of the loop？
>>
>
>
> --
> Ryan Blue
>

Re: Performance of VectorizedRleValuesReader

Reply via email to