[moving dev-owner@ to BCC]

 

Forwarding to the Tika list.

 

 

 

From: Prateek Agarwal <[email protected]>
Date: Tuesday, May 16, 2017 at 6:35 AM
To: "[email protected]" <[email protected]>
Subject: TikaInputStream parse the content and write to OutputStream

 

Hi,

We have a Upload API that basically uploads file to a Server. Now as per new 
requirement I've to scan the content for any malware and if not present store 
the file to the server. The basic upload is working fine. Problem I'm facing is 
when I use Apache Tika.

1.    How do we get to know if the file is a malware?

2.    I'm able to get the content from Tika Parser, but the file that's stored 
is of zero size on server. Do I have to clone the Input Stream, one for tika 
parser, one for output stream?

Code:
try (final BufferedInputStream input = new BufferedInputStream(pInputStream, 
bytesSize);
    final BufferedOutputStream output = new BufferedOutputStream(new 
FileOutputStream(pObjectFile), bytesSize);
        final TikaInputStream stream = TikaInputStream.get(input)) {
    try {
        //parsing the file
        parser.parse(stream, handler, metadata, context);
        LOGGER.log(Level.INFO, "File content - {0}", handler.toString());
    } catch (IOException | SAXException | TikaException ex) {
        LOGGER.log(Level.SEVERE, null, ex);
    }
    byte[] buffer = new byte[bytesSize];
    // Tried inpt.read as well as stream.read, both are not working
    for (int length = 0; ((length = stream.read(buffer)) > 0);) {
        output.write(buffer, 0, length);
        bytesWritten += length;
    }
}
 

I've even asked the same Question of SOF

~

Prateek Agarwal

 

-- 

~
Prateek Agarwal

Reply via email to