[ 
https://issues.apache.org/jira/browse/TIKA-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959775#comment-14959775
 ] 

Bob Paulin commented on TIKA-1762:
----------------------------------

Hey Nick,

Just so we're clear on what's being proposed are you thinking something like 
this with respect to the ExternalParser:

{code}
public class ExternalParser extends AbstractParser {
    
    private final ExecutorService executorService;
    
    public ExternalParser() {
        this.executorService = Executors.newSingleThreadExecutor();
    }
    
    public ExternalParser(TikaConfig config)
    {
        this.executorService = config.getExecutorService();
    }   
    
    public ExternalParser(ExecutorService executorService)
    {
        this.executorService = executorService
    }
...
{code}

This does at least provide a default.  For example the following:
{code}
private void sendInput(final Process process, final InputStream stream) {
        Thread t = new Thread() {
            public void run() {
                OutputStream stdin = process.getOutputStream();
                try {
                    IOUtils.copy(stream, stdin);
                } catch (IOException e) {
                }
            }
        };
        t.start();
        try{
           t.join();
        }
        catch(InterruptedException ignore){}        
    }
{code}

Could be replaced with:
{code}
private void sendInput(final Process process, final InputStream stream) {
        Future result = executorService.submit(new Runnable() {
            public void run() {
                OutputStream stdin = process.getOutputStream();
                try {
                    IOUtils.copy(stream, stdin);
                } catch (IOException e) {
                }
            }
        });
        try{
           result.get();
        }
        catch(InterruptedException ignore){}        
    }
{code}

The approach I was originally thinking of would not require an executor service 
to be set as a default but would replace replace the Thread with:
{code}
private void sendInput(final Process process, final InputStream stream) {
        Future result = ConcurrentUtils.execute(new Runnable(){
            public void run() {
                OutputStream stdin = process.getOutputStream();
                try {
                    IOUtils.copy(stream, stdin);
                } catch (IOException e) {
                }
            }
        });
        try{
           result.get();
        }
        catch(InterruptedException ignore){}        
    }
{code}
The code above is not as elegant but doesn't require a default thread pool at 
instantiation.  The ConcurrentUtil method could also be modified to take the 
ExecutorService directly so it could come from the Parser's field instead of 
only the ParseContext.  Then a null ExecutorService could fall back to just 
creating threads.  Any opinions on which is worse: Spinning up a number of 
threads that are infrequently used in a Pool or creating threads on the fly the 
way we do now? 

> Create Executor Service from TikaConfig
> ---------------------------------------
>
>                 Key: TIKA-1762
>                 URL: https://issues.apache.org/jira/browse/TIKA-1762
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Bob Paulin
>             Fix For: 1.11
>
>
> Create a configurable executor service that is configurable from the 
> TikaConfig.
>  Konstantin Gribov added a comment - 23/Sep/15 09:55
> Bob Paulin, I have two ideas on the issue:
>     by default use common thread pool, configured via and contained in 
> TikaConfig as Tyler Palsulich suggested,
>     you can pass thread pool for parser invocation via ParserContext with 
> fallback to default if now thread pool/executor service in context.
> Also o.a.tika.Tika#parse(InputStream, Metadate) produces 
> o.a.tika.parser.ParsingReader and anonymous Executor with unbounded daemon 
> thread creation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to