Doug Cutting wrote:
Hairong Kuang wrote:
Another option is to create a checksum file per block at the data node
where
the block is placed.
Yes, but then we'd need a separate checksum implementation for
intermediate data, and for other distributed filesystems that don't
already guarantee end-to-end data integrity. Also, a checksum per block
would not permit checksums on randomly accessed data without
re-checksumming the entire block. Finally, the checksum wouldn't be
end-to-end. We really want to checksum data as close to its source as
possible, then validate that checksum as close to its use as possible.
I'm guessing the big impediment is lack of support in Java, but it seems
like this would be good application for extended attributes/alternate
forks/streams that so many file systems support these days.
JSR-203 ("NIO.2") adds multiple fork support and was approved more than
three years ago:
http://jcp.org/en/jsr/detail?id=203
At the time it was slated for JDK 1.5, but then got deferred until Java
7. The story is tortured:
http://forums.java.net/jive/thread.jspa?threadID=298&messageID=12696
http://en.wikipedia.org/wiki/New_I/O
With the Open Sourcing of Java, it seems like the code for NIO.2 should
be available now or soon though.
Ahh, here we go, Eclipse File System.
http://eclipsezone.com/eclipse/forums/t83786.html
This makes it sound like it might actually work:
http://wiki.eclipse.org/index.php/EFS#Local_file_system
Egad, there are more:
Extended Filesystem API (WebNFS)
http://docs.sun.com/app/docs/doc/806-1067/6jacl3e6g?a=view
NetBeans Filesystem API
http://www.netbeans.org/download/dev/javadoc/org-openide-filesystems/org/openide/filesystems/doc-files/api.html
Apache Commons VFS
http://jakarta.apache.org/commons/vfs/index.html
New I/O: Improved filesystem interface
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4313887
JSR 203: More New I/O APIs for the Java Platform ("NIO.2")
http://jcp.org/en/jsr/detail?id=203
IBM's AIO4 looks like a partial implementation, but focused on the
asynchronous portion of the new API.
http://alphaworks.ibm.com/tech/aio4j
Somewhere in all that seems like there should be a nifty way to handle
this. But I can see that sorting it out is a big job. What a mess.
*sigh*
Jim