[ 
https://issues.apache.org/jira/browse/ORC-350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17653932#comment-17653932
 ] 

Penglei Shi edited comment on ORC-350 at 1/3/23 10:28 AM:
----------------------------------------------------------

In some cases,  especially columns which are big json or text, predicates 
pushdown is needless, column statistics and row index are barely used, Can ORC 
disable column statistics? Not only does not write RowIndexEntry via 
`orc.create.index=false`, but also does not call methods like 
'updateString/updateDouble/updateXXX'. 


was (Author: penglei shi):
In some cases,  especially columns which are big json or text, predicates 
pushdown is needless, column statistics and row index are barely used, Can ORC 
disables column statistics? Not only does not write RowIndexEntry via 
`orc.create.index=false`, but also does not call methods like 
'updateString/updateDouble/updateXXX'. 

> Optionally disable/specify indexes for columns
> ----------------------------------------------
>
>                 Key: ORC-350
>                 URL: https://issues.apache.org/jira/browse/ORC-350
>             Project: ORC
>          Issue Type: Sub-task
>            Reporter: Prasanth Jayachandran
>            Priority: Major
>
> There are many cases where entire xml or big json is stored as string column. 
> If we autogenerate indexes on those columns, we often run into issues with 
> protobuf stream explosion. The only workaround for now is to change from 
> string to binary. It will be good to have an option to disable indexes on 
> specific columns. 
> Regardless, I think we should have max limits on string column statistics. If 
> that limit is exceeded PPD should handle it accordingly (by returning 
> YES_NO_NULL).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to