[ https://issues.apache.org/jira/browse/ORC-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun reassigned ORC-1986: ---------------------------------- Assignee: Wan Kun > Trigger flush stripe for large input rows > ----------------------------------------- > > Key: ORC-1986 > URL: https://issues.apache.org/jira/browse/ORC-1986 > Project: ORC > Issue Type: Improvement > Reporter: Wan Kun > Assignee: Wan Kun > Priority: Major > > For large input rows, the stripe may excessively large , requiring more > memory for both reading and writing one strip. > We can check the tree write size in bytes and flush the strip even when the > input rows count is less than 5000. > {code:java} > Stripes: > Stripe: offset: 3 data: 347494188 rows: 5120 tail: 244 index: 2304 > Stream: column 0 section ROW_INDEX start: 3 length 12 > Stream: column 1 section ROW_INDEX start: 15 length 110 > Stream: column 2 section ROW_INDEX start: 125 length 893 > Stream: column 3 section ROW_INDEX start: 1018 length 31 > Stream: column 4 section ROW_INDEX start: 1049 length 65 > Stream: column 5 section ROW_INDEX start: 1114 length 923 > Stream: column 6 section ROW_INDEX start: 2037 length 25 > Stream: column 7 section ROW_INDEX start: 2062 length 155 > Stream: column 8 section ROW_INDEX start: 2217 length 28 > Stream: column 9 section ROW_INDEX start: 2245 length 31 > Stream: column 10 section ROW_INDEX start: 2276 length 31 > Stream: column 1 section DATA start: 2307 length 81853 > Stream: column 1 section LENGTH start: 84160 length 2191 > Stream: column 2 section DATA start: 86351 length 345862763 > Stream: column 2 section LENGTH start: 345949114 length 13736 > Stream: column 3 section DATA start: 345962850 length 22 > Stream: column 3 section LENGTH start: 345962872 length 6 > Stream: column 3 section DICTIONARY_DATA start: 345962878 length 5 > Stream: column 4 section PRESENT start: 345962883 length 200 > Stream: column 4 section DATA start: 345963083 length 6322 > Stream: column 4 section LENGTH start: 345969405 length 495 > Stream: column 4 section DICTIONARY_DATA start: 345969900 length 2919 > Stream: column 5 section DATA start: 345972819 length 1507883 > Stream: column 5 section LENGTH start: 347480702 length 7346 > Stream: column 6 section DATA start: 347488048 length 22 > Stream: column 6 section LENGTH start: 347488070 length 6 > Stream: column 6 section DICTIONARY_DATA start: 347488076 length 0 > Stream: column 7 section DATA start: 347488076 length 5795 > Stream: column 7 section LENGTH start: 347493871 length 301 > Stream: column 7 section DICTIONARY_DATA start: 347494172 length 2187 > Stream: column 8 section DATA start: 347496359 length 22 > Stream: column 8 section LENGTH start: 347496381 length 6 > Stream: column 8 section DICTIONARY_DATA start: 347496387 length 4 > Stream: column 9 section DATA start: 347496391 length 58 > Stream: column 9 section LENGTH start: 347496449 length 6 > Stream: column 9 section DICTIONARY_DATA start: 347496455 length 7 > Stream: column 10 section DATA start: 347496462 length 22 > Stream: column 10 section LENGTH start: 347496484 length 6 > Stream: column 10 section DICTIONARY_DATA start: 347496490 length 5 > Encoding column 0: DIRECT > Encoding column 1: DIRECT_V2 > Encoding column 2: DIRECT_V2 > Encoding column 3: DICTIONARY_V2[1] > Encoding column 4: DICTIONARY_V2[661] > Encoding column 5: DIRECT_V2 > Encoding column 6: DICTIONARY_V2[1] > Encoding column 7: DICTIONARY_V2[682] > Encoding column 8: DICTIONARY_V2[1] > Encoding column 9: DICTIONARY_V2[2] > Encoding column 10: DICTIONARY_V2[1] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)