[
https://issues.apache.org/jira/browse/JCR-3735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13910300#comment-13910300
]
Thomas Mueller commented on JCR-3735:
-------------------------------------
Hm, you mean we create a new variant of FileDataStore that doesn't do
de-duplication? OK, that's an idea.
And of course the user might decide to pass a wrapped input stream
(BufferedInputStream or similar). I don't know of a good solution for this.
One item to not forget is that the input stream might not be positioned at the
very beginning. But this can be supported. I wrote some proof of concept code,
maybe it is helpful:
{noformat}
public static void main(String... args) throws Exception {
String fileName = System.getProperty("user.home") + "/temp/test.txt";
FileOutputStream out = new FileOutputStream(fileName);
out.write("Hello World".getBytes("UTF-8"));
InputStream in = new FileInputStream(fileName);
// skip the first byte
in.read();
process(in);
}
static void process(InputStream in) throws Exception {
if (!(in instanceof FileInputStream)) {
// use default
}
FileInputStream fin = (FileInputStream) in;
FileChannel c = fin.getChannel();
long start = c.position();
System.out.println("start: " + start);
long length = c.size() - start;
MessageDigest digest = MessageDigest.getInstance("SHA-1");
ByteBuffer buff = ByteBuffer.allocate(64 * 1024);
long pos = start;
while (true) {
long len = c.read(buff, pos);
if (len < 0) {
break;
}
pos += len;
digest.update(buff.array(), 0, buff.remaining());
buff.clear();
}
byte[] sha1 = digest.digest(new byte[0]);
String outFileName = System.getProperty("user.home") +
"/temp/" + new BigInteger(sha1).toString(16) + ".txt";
FileChannel out = new RandomAccessFile(outFileName, "rw").getChannel();
while (length > 0) {
long len = c.transferTo(start, length, out);
length -= len;
}
out.close();
c.close();
}
{noformat}
> Efficient copying of binaries in Jackrabbit DataStores
> ------------------------------------------------------
>
> Key: JCR-3735
> URL: https://issues.apache.org/jira/browse/JCR-3735
> Project: Jackrabbit Content Repository
> Issue Type: Improvement
> Components: jackrabbit-core
> Affects Versions: 2.7.4
> Reporter: Amit Jain
>
> In the DataStore implementations an additional temporary file is created for
> every binary uploaded. This step is an additional overhead when the upload
> process itself creates a temporary file.
> So, the solution proposed is to check if the input stream passed is a
> FileInputStream and then use the FileChannel object associated with the input
> stream to copy the file.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)