arthursunbao commented on issue #10885:
URL: https://github.com/apache/arrow/issues/10885#issuecomment-894037170


   Hi  westonpace,
   
   Thanks for your quick response. 
   
   Our scenario is like this: 
   
   We have a recommendation system and we want to transfer the user data from 
kafka and hive to online redis-like storage. We found that arrow has good 
columar storage capabilities and deserializes data without needing to parse the 
entire data content like protobuf so we use arrow to serialize and compress the 
user data into binary using arrowStreamWriter in kafka and hive 
   
   However, in our scenario, the data schema for every user is different, so we 
want to keep an IDL schema in an independent management system for each user so 
that when the serialized data is in redis, the third party system that loads 
the redis arrow-serialized data can read the user's unique schema and 
unserialized the binary data using arrowFileReader. 
   
   We dig on the arrow Java api and found that when reading data using 
arrowFileReader, we first need to do like this:
   
   ``
   RootAllocator allocator = new RootAllocator();
   VectorSchemaRoot schemaRoot = VectorSchemaRoot.create(UserSchema.schema(), 
allocator);
   FileOutputStream fileOutputStream = new FileOutputStream(FILE_PATH);
   ArrowFileWriter arrowFileWriter = new ArrowFileWriter(schemaRoot, null, 
fileOutputStream.getChannel())) 
   ``
   
   So basically if the user uses the Java sdk, he needs to keep an 
UserSchema.schema() java file.
   So what if the user wants to use C++ sdk to read the scemea, does it mean 
that he need to keep an C++ struct as well?
   
   Thanks in advance
   Jason
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to