chenpingzeng commented on issue #1512:
URL: https://github.com/apache/orc/issues/1512#issuecomment-1558774853
Thanks for the advising pr. The key attension was to avoid using memset the
notNull.data() to 1 for little bit performance negative effects.
I would like to share some experience of using result of
orc::RowReader.next, not obvious performance improment in tpcds-99 test with
3TB data set when we directly copy orc::StructVectorBatch.fields[i] to dest obj
memory, in case when hasNulls=0, bypass the unnessary checking of
notNull.data()[i], sure it is meanningful to do this code refacting. So I
think it is not a wasting of performance to ensure all values in notNull.data()
are 1 when hasNulls=0.
On the other side, the problem for ‘miss use of reading
notNull.data()‘ did exist after half a year until tpcds99 consistant checking
very recently. As I mentioned in the issue, it was extremely difficult to
figure out the condition to find out the problem data row, since over 8 billion
records in table store_sales, or even more records in other scenes. That is
say, from an expert or god view, sure it is user’s problem to read the
notNull.data() when hasNull=0. Does any one has considered this question: do
we have stop user stepping into this strap.?(Yes, I think it is a strap that
notNull.data() has some 0 values when hasNull=0, it is a data inconsistent in
my opinion)
发件人: Gang Wu ***@***.***>
发送时间: 2023年5月23日 15:32
收件人: apache/orc ***@***.***>
抄送: Chenpingzeng ***@***.***>; Mention ***@***.***>
主题: Re: [apache/orc] [C++] RowReaderImpl::next return inconsistant data in
certain case (Issue #1512)
Hi @chenpingzeng<https://github.com/chenpingzeng>,
https://github.com/apache/orc/pull/1469/files has discussed the same thing.
Please check the comment below to see if it solves this issue.
https://github.com/apache/orc/blob/main/c%2B%2B/include/orc/Vector.hh#L40-L44
—
Reply to this email directly, view it on
GitHub<https://github.com/apache/orc/issues/1512#issuecomment-1558697104>, or
unsubscribe<https://github.com/notifications/unsubscribe-auth/AN4CUN44RBRDXOYQCHSEJ7DXHRRYNANCNFSM6AAAAAAYKKRVYE>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]