Thanks a lot for your reply. I will bypass constant array now, and hope
to use constant array in the future.

Song

> 2022年5月7日 上午2:30,Weston Pace <weston.p...@gmail.com> 写道:
> 
> Hi Song,
> 
> Wes proposed a couple of different array types a few months ago in
> [1].  These were documented in [2].  In this proposal a constant array
> type was suggested in addition to a run-length encoded array type.
> During the discussion it was suggested that a constant array might
> just be a special case of a run-length encoded array.  So there has
> been some discussion about adding support for this.  However, these
> ideas have not been implemented yet and I'm not aware of any PRs so it
> can be difficult to know if/when something may happen.
> 
> In the present moment you might be able to use
> arrow::compute::ExecBatch which is what we use in the streaming
> execution engine to bypass this problem.  An ExecBatch is a vector of
> datums and so each column could either be a scalar or an array.  The
> batch itself has a length so if a batch with length 50 has a scalar
> column then that implies a constant array of 50 items.  However, this
> does add complication to the logic (constantly needing to check if a
> column is a scalar or an array) and I do hope the RLE array is added
> as it can simplify a lot of this.
> 
> -Weston
> 
> [1] https://lists.apache.org/thread/49qzofswg1r5z7zh39pjvd1m2ggz2kdq
> [2] 
> https://docs.google.com/document/d/12aZi8Inez9L_JCtZ6gi2XDbQpCsHICNy9_EUxj4ILeE/edit#heading=h.j2x776n0ymmp
> 
> On Thu, May 5, 2022 at 4:28 PM Dongxiao Song <songdongx...@hashdata.cn> wrote:
>> 
>> Hello,
>> 
>> I’m using arrow c++ as storage and computing structure of my own project,
>> which is a database based on PostgresSQL.
>> 
>> But when computing with a batch containing constant value column, the 
>> constant
>> value has to be expanded to an array to store into batch, which is waste of 
>> time
>> and memory.
>> 
>> Arrow::scalar can be used as parameter for arrow functions, but cannot 
>> represent
>> a column in batch. So if we want to compute a batch containing constant 
>> value column,
>> the expansion of value is inevitable.
>> 
>> This occurs mainly before batch serialization, and functions like 
>> FilterBatch.
>> 
>> A constant-type array may solve this problem. It looks like an arrow array,
>> but only stores single constant value and number of rows. In functions like
>> Arrow::Sum, the result can even be computed by multiplication.
>> 
>> Another solution is allowing batch containing Arrow::Scalar.
>> 
>> All this is just a suggestion from an Arrow user. I’m not sure that whether 
>> it is helpful
>> for Arrow project.
>> 
>> Thanks,
>> Song
> 

Reply via email to