paolodelano opened a new issue, #2251:
URL: https://github.com/apache/hop/issues/2251

   ### Apache Hop version?
   
   2.3.0
   
   ### Java version?
   
   19.0.1
   
   ### Operating system
   
   Windows
   
   ### What happened?
   
   Used the ParquetFile Input Transform to read from S3. When the file size is  
> 2GB I get the following exception:
   
   2023/02/07 09:26:55 - Parquet File Input.0 - ERROR: Unexpected error
   2023/02/07 09:26:55 - Parquet File Input.0 - ERROR: 
org.apache.hop.core.exception.HopException: 
   2023/02/07 09:26:55 - Parquet File Input.0 - Error read file 
s3://bucket/file.parquet
   2023/02/07 09:26:55 - Parquet File Input.0 - Negative initial size: 
-1383794380
   2023/02/07 09:26:55 - Parquet File Input.0 - 
   2023/02/07 09:26:55 - Parquet File Input.0 -         at 
org.apache.hop.parquet.transforms.input.ParquetInput.processRow(ParquetInput.java:101)
   2023/02/07 09:26:55 - Parquet File Input.0 -         at 
org.apache.hop.pipeline.transform.RunThread.run(RunThread.java:55)
   2023/02/07 09:26:55 - Parquet File Input.0 -         at 
java.base/java.lang.Thread.run(Thread.java:1589)
   2023/02/07 09:26:55 - Parquet File Input.0 - Caused by: 
java.lang.IllegalArgumentException: Negative initial size: -1383794380
   2023/02/07 09:26:55 - Parquet File Input.0 -         at 
java.base/java.io.ByteArrayOutputStream.<init>(ByteArrayOutputStream.java:78)
   2023/02/07 09:26:55 - Parquet File Input.0 -         at 
org.apache.hop.parquet.transforms.input.ParquetInput.processRow(ParquetInput.java:84)
   2023/02/07 09:26:55 - Parquet File Input.0 -         ... 2 more
   2023/02/07 09:26:55 - Parquet File Input.0 - Finished processing (I=0, O=0, 
R=1, W=0, U=0, E=1)
   
   
   We suspect there is a bug in the ParquetInput.java code below where a Long 
is cast to an Int
   
         long size = fileObject.getContent().getSize();
         InputStream inputStream = HopVfs.getInputStream(fileObject);
   
         // Reads the whole file into memory...
         //
         ByteArrayOutputStream outputStream = new ByteArrayOutputStream((int) 
size);
   
   ### Issue Priority
   
   Priority: 2
   
   ### Issue Component
   
   Component: Transforms


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to