chenpingzeng commented on issue #1512:
URL: https://github.com/apache/orc/issues/1512#issuecomment-1558774853

   Thanks for the advising pr. The key attension was to avoid using memset the 
notNull.data() to 1 for little bit performance negative effects.
   I would like to share some experience of using result of 
orc::RowReader.next, not obvious performance improment in tpcds-99 test with 
3TB data set when we directly copy orc::StructVectorBatch.fields[i] to dest obj 
memory, in case when hasNulls=0, bypass the unnessary checking of 
notNull.data()[i], sure it is meanningful to do this code refacting.  So I 
think it is not a wasting of performance to ensure all values in notNull.data() 
are 1 when hasNulls=0.
            On the other side, the problem for ‘miss use of reading 
notNull.data()‘ did exist after half a year until tpcds99 consistant checking 
very recently. As I mentioned in the issue, it was extremely difficult to 
figure out the condition to find out the problem data row, since over 8 billion 
records in table store_sales, or even more records in other scenes. That is 
say, from an expert or god view, sure it is user’s problem to read the 
notNull.data() when hasNull=0. Does any one has considered this question:  do 
we have stop user stepping into this strap.?(Yes, I think it is a strap that 
notNull.data() has some 0 values when hasNull=0, it is a data inconsistent in 
my opinion)
   
   发件人: Gang Wu ***@***.***>
   发送时间: 2023年5月23日 15:32
   收件人: apache/orc ***@***.***>
   抄送: Chenpingzeng ***@***.***>; Mention ***@***.***>
   主题: Re: [apache/orc] [C++] RowReaderImpl::next return inconsistant data in 
certain case (Issue #1512)
   
   
   Hi @chenpingzeng<https://github.com/chenpingzeng>, 
https://github.com/apache/orc/pull/1469/files has discussed the same thing. 
Please check the comment below to see if it solves this issue.
   
   https://github.com/apache/orc/blob/main/c%2B%2B/include/orc/Vector.hh#L40-L44
   
   —
   Reply to this email directly, view it on 
GitHub<https://github.com/apache/orc/issues/1512#issuecomment-1558697104>, or 
unsubscribe<https://github.com/notifications/unsubscribe-auth/AN4CUN44RBRDXOYQCHSEJ7DXHRRYNANCNFSM6AAAAAAYKKRVYE>.
   You are receiving this because you were mentioned.Message ID: ***@***.***>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to