[ 
https://issues.apache.org/jira/browse/ORC-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned ORC-1986:
----------------------------------

    Assignee: Wan Kun

> Trigger flush stripe for large input rows
> -----------------------------------------
>
>                 Key: ORC-1986
>                 URL: https://issues.apache.org/jira/browse/ORC-1986
>             Project: ORC
>          Issue Type: Improvement
>            Reporter: Wan Kun
>            Assignee: Wan Kun
>            Priority: Major
>
> For large input rows, the stripe may excessively large , requiring more 
> memory for both reading and writing one strip.
> We can check the tree write size in bytes and flush the strip even when the 
> input rows count is less than 5000.
> {code:java}
> Stripes:
>   Stripe: offset: 3 data: 347494188 rows: 5120 tail: 244 index: 2304
>     Stream: column 0 section ROW_INDEX start: 3 length 12
>     Stream: column 1 section ROW_INDEX start: 15 length 110
>     Stream: column 2 section ROW_INDEX start: 125 length 893
>     Stream: column 3 section ROW_INDEX start: 1018 length 31
>     Stream: column 4 section ROW_INDEX start: 1049 length 65
>     Stream: column 5 section ROW_INDEX start: 1114 length 923
>     Stream: column 6 section ROW_INDEX start: 2037 length 25
>     Stream: column 7 section ROW_INDEX start: 2062 length 155
>     Stream: column 8 section ROW_INDEX start: 2217 length 28
>     Stream: column 9 section ROW_INDEX start: 2245 length 31
>     Stream: column 10 section ROW_INDEX start: 2276 length 31
>     Stream: column 1 section DATA start: 2307 length 81853
>     Stream: column 1 section LENGTH start: 84160 length 2191
>     Stream: column 2 section DATA start: 86351 length 345862763
>     Stream: column 2 section LENGTH start: 345949114 length 13736
>     Stream: column 3 section DATA start: 345962850 length 22
>     Stream: column 3 section LENGTH start: 345962872 length 6
>     Stream: column 3 section DICTIONARY_DATA start: 345962878 length 5
>     Stream: column 4 section PRESENT start: 345962883 length 200
>     Stream: column 4 section DATA start: 345963083 length 6322
>     Stream: column 4 section LENGTH start: 345969405 length 495
>     Stream: column 4 section DICTIONARY_DATA start: 345969900 length 2919
>     Stream: column 5 section DATA start: 345972819 length 1507883
>     Stream: column 5 section LENGTH start: 347480702 length 7346
>     Stream: column 6 section DATA start: 347488048 length 22
>     Stream: column 6 section LENGTH start: 347488070 length 6
>     Stream: column 6 section DICTIONARY_DATA start: 347488076 length 0
>     Stream: column 7 section DATA start: 347488076 length 5795
>     Stream: column 7 section LENGTH start: 347493871 length 301
>     Stream: column 7 section DICTIONARY_DATA start: 347494172 length 2187
>     Stream: column 8 section DATA start: 347496359 length 22
>     Stream: column 8 section LENGTH start: 347496381 length 6
>     Stream: column 8 section DICTIONARY_DATA start: 347496387 length 4
>     Stream: column 9 section DATA start: 347496391 length 58
>     Stream: column 9 section LENGTH start: 347496449 length 6
>     Stream: column 9 section DICTIONARY_DATA start: 347496455 length 7
>     Stream: column 10 section DATA start: 347496462 length 22
>     Stream: column 10 section LENGTH start: 347496484 length 6
>     Stream: column 10 section DICTIONARY_DATA start: 347496490 length 5
>     Encoding column 0: DIRECT
>     Encoding column 1: DIRECT_V2
>     Encoding column 2: DIRECT_V2
>     Encoding column 3: DICTIONARY_V2[1]
>     Encoding column 4: DICTIONARY_V2[661]
>     Encoding column 5: DIRECT_V2
>     Encoding column 6: DICTIONARY_V2[1]
>     Encoding column 7: DICTIONARY_V2[682]
>     Encoding column 8: DICTIONARY_V2[1]
>     Encoding column 9: DICTIONARY_V2[2]
>     Encoding column 10: DICTIONARY_V2[1]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to