Baunsgaard commented on PR #1697:
URL: https://github.com/apache/systemds/pull/1697#issuecomment-1250143469

   Since the compression format have a tendency to change a bit the files 
written will not be fully supported at all times across different versions. A 
suggestion to detect changes or incompatible version numbers is to write a 
identifier  to the files in the beginning, 
   
   - GitHash 
   - SystemDS version Number 
   
   Since GitHash is not available at all times we could use SystemDS version 
number as a fall back. I do not personally like either solution maybe someone 
else have some suggestions?
   
   Other design decisions:
   
   1. For distributed i intend to simply write each compressed block in 
different files like we already do.
   2. Parallel reading and writing could be made with many files, for instance 
i could split each each column group into a separate file instead of multiple 
blocks, perhaps someone have some experience or ideas?
   
   Help / Comments appreciated
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to