kkrugler opened a new issue #6311:
URL: https://github.com/apache/incubator-pinot/issues/6311
Currently a big segment fails during the “converting segment” phase:
```
Converting segment:
/tmp/pinot-d6bab609-8906-4c84-966b-5f96d41b1d80/output/crawldata_OFFLINE_2018-10-13_2020-10-11_0
to v3 format
v3 segment location for segment: crawldata_OFFLINE_2018-10-13_2020-10-11_0
is
/tmp/pinot-d6bab609-8906-4c84-966b-5f96d41b1d80/output/crawldata_OFFLINE_2018-10-13_2020-10-11_0/v3
Deleting files in v1 segment directory:
/tmp/pinot-d6bab609-8906-4c84-966b-5f96d41b1d80/output/crawldata_OFFLINE_2018-10-13_2020-10-11_0
Computed crc = 1033854200, based on files
[/tmp/pinot-d6bab609-8906-4c84-966b-5f96d41b1d80/output/crawldata_OFFLINE_2018-10-13_2020-10-11_0/v3/columns.psf,
/tmp/pinot-d6bab609-8906-4c84-966b-5f96d41b1d80/output/crawldata_OFFLINE_2018-10-13_2020-10-11_0/v3/index_map,
/tmp/pinot-d6bab609-8906-4c84-966b-5f96d41b1d80/output/crawldata_OFFLINE_2018-10-13_2020-10-11_0/v3/metadata.properties]
Driver, record read time : 236809
Driver, stats collector time : 0
Driver, indexing time : 122449
Tarring segment from:
/tmp/pinot-d6bab609-8906-4c84-966b-5f96d41b1d80/output/crawldata_OFFLINE_2018-10-13_2020-10-11_0
to:
/tmp/pinot-d6bab609-8906-4c84-966b-5f96d41b1d80/output/crawldata_OFFLINE_2018-10-13_2020-10-11_0.tar.gz
Failed to generate Pinot segment for file -
s3://adbeat-pinot-files/compressed/3.gz
java.lang.RuntimeException: entry size ‘8991809155’ is too big ( >
8589934591 ).
at
org.apache.commons.compress.archivers.tar.TarArchiveOutputStream.failForBigNumber(TarArchiveOutputStream.java:636)
~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21]
```
As per https://commons.apache.org/proper/commons-compress/tar.html, Pinot
should be using `BIGNUMBER_POSIX` for the bigNumberMode so that it doesn't have
an 8GB limit.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]