paolodelano opened a new issue, #2251:
URL: https://github.com/apache/hop/issues/2251
### Apache Hop version?
2.3.0
### Java version?
19.0.1
### Operating system
Windows
### What happened?
Used the ParquetFile Input Transform to read from S3. When the file size is
> 2GB I get the following exception:
2023/02/07 09:26:55 - Parquet File Input.0 - ERROR: Unexpected error
2023/02/07 09:26:55 - Parquet File Input.0 - ERROR:
org.apache.hop.core.exception.HopException:
2023/02/07 09:26:55 - Parquet File Input.0 - Error read file
s3://bucket/file.parquet
2023/02/07 09:26:55 - Parquet File Input.0 - Negative initial size:
-1383794380
2023/02/07 09:26:55 - Parquet File Input.0 -
2023/02/07 09:26:55 - Parquet File Input.0 - at
org.apache.hop.parquet.transforms.input.ParquetInput.processRow(ParquetInput.java:101)
2023/02/07 09:26:55 - Parquet File Input.0 - at
org.apache.hop.pipeline.transform.RunThread.run(RunThread.java:55)
2023/02/07 09:26:55 - Parquet File Input.0 - at
java.base/java.lang.Thread.run(Thread.java:1589)
2023/02/07 09:26:55 - Parquet File Input.0 - Caused by:
java.lang.IllegalArgumentException: Negative initial size: -1383794380
2023/02/07 09:26:55 - Parquet File Input.0 - at
java.base/java.io.ByteArrayOutputStream.<init>(ByteArrayOutputStream.java:78)
2023/02/07 09:26:55 - Parquet File Input.0 - at
org.apache.hop.parquet.transforms.input.ParquetInput.processRow(ParquetInput.java:84)
2023/02/07 09:26:55 - Parquet File Input.0 - ... 2 more
2023/02/07 09:26:55 - Parquet File Input.0 - Finished processing (I=0, O=0,
R=1, W=0, U=0, E=1)
We suspect there is a bug in the ParquetInput.java code below where a Long
is cast to an Int
long size = fileObject.getContent().getSize();
InputStream inputStream = HopVfs.getInputStream(fileObject);
// Reads the whole file into memory...
//
ByteArrayOutputStream outputStream = new ByteArrayOutputStream((int)
size);
### Issue Priority
Priority: 2
### Issue Component
Component: Transforms
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]