Hi Marshall, we can do the file -> uri -> url conversion to handle spaces in path string inside the FileSystemCollectionReader, keeping the old API with populateCASFromURL instead of populateCASFromURI inside the TikaWrapper. Tommaso
p.s.: we definitely need more test cases for some Sandbox projects :-) 2010/9/21 Marshall Schor <[email protected]> > I noticed that the patch changes a public API (populateCASfromURL). This > will > break backwards compatibility, if anyone has code that is depending on that > API. > > If there is a convenient way to implement the fix without changing the > APIs, I > think our users may prefer that :-) . > > -Marshall > > On 9/20/2010 1:52 AM, Tommaso Teofili (JIRA) wrote: > > [ > https://issues.apache.org/jira/browse/UIMA-1878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel] > > > > Tommaso Teofili resolved UIMA-1878. > > ----------------------------------- > > > > Resolution: Fixed > > > >> TikaAnnotator doesn't handle spaces in path string > >> -------------------------------------------------- > >> > >> Key: UIMA-1878 > >> URL: https://issues.apache.org/jira/browse/UIMA-1878 > >> Project: UIMA > >> Issue Type: Bug > >> Components: Sandbox-TikaAnnotator > >> Affects Versions: 2.3 > >> Environment: Windows > >> Reporter: Greg Holmberg > >> Attachments: TikaAnnotator-patch.txt > >> > >> > >> If you give a value for InputDirectory that contains a space, then > TikiAnnotator silently does nothing. > >> This is because File objects are converted directly to a URL, and > openStream() fails because the space character wasn't converted to %20. > >> When this happens, the exception is ignored and the CAS text is set to > "". > >> It would be better to convert the File object to a URI and the URI to a > URL. This will convert the space character correctly. > >> Secondly, it would be better the throw an exception rather than silently > ignore it. > >> A suggested patch is attached. >
