Vectorrent commented on issue #43275:
URL: https://github.com/apache/arrow/issues/43275#issuecomment-2237947651
After doing some more testing, I can confirm that this bug almost certainly
exists in `apache-arrow` - and not in `parquet-wasm`. If you execute the above
script in Node.js, it will quickly fail - and it will write the failing buffer
to disk. That happens when `apache-arrow` fails to convert a buffer into a
table correctly.
However, if you run the following script - you will read that "failed"
buffer from disk in `pyarrow` - and it will work correctly.
Thus, the bug is in `apache-arrow` - not in `parquet-wasm`.
```py
import pyarrow as pa
import pyarrow.ipc as ipc
def main():
# Replace 'path/to/your/arrow_file.arrow' with the actual path to your
Arrow file
arrow_file_on_disk = './arrowStreamBuffer.txt'
try:
# Open the Arrow file
with ipc.open_stream(arrow_file_on_disk) as reader:
# Read all the data into a table
table = reader.read_all()
# Print information about the table
print(f"Table schema: {table.schema}")
print(f"Number of columns: {table.num_columns}")
print(f"Number of rows: {table.num_rows}")
# Print the first 5 rows of the table
print("\nFirst 5 rows of data:")
print(table.to_pandas().head())
except FileNotFoundError:
print(f"Error: The file '{arrow_file_on_disk}' was not found.")
except pa.lib.ArrowInvalid:
print(f"Error: '{arrow_file_on_disk}' is not a valid Arrow file or
is corrupted.")
except Exception as e:
print(f"An error occurred: {str(e)}")
if __name__ == "__main__":
main()
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]