Hi, >> Exactly. The DataStore should also check if the InputStream is a >> DataStoreInputStream, so maybe it doesn't need to copy the binary: > > IMHO we should (and currently do) handle that on a higher level, by > tracking the DataIdentifier in InternalValue.
Currently we don't detect that the binary already exists when using the regular JCR API. With the DataStoreInputStream we could do that, and it would be backward compatible. The DataStoreInputStream should be part of the Jackrabbit API. Of course we can internally continue to use InternalValue and BLOBFileValue to avoid using instanceof whenever possible. > The same goes for the proposed text extraction, virus scanning, etc. > extensions. The DataRecord interface is not a public API so we could > easily modify it to cover such needs. Using instanceof and overloading > the input stream is a hack. Just adding one extra stream to binary InternalValues would only solve the text extraction problem, but it wouldn't be a generic solution. There would be no way for a regular (user defined) EventListener to make sure a unique binary is only processed once. The DataStoreInputStream would provide such a solution. Regards, Thomas