Thanks Govind. Can you please log a ticket in the issue tracker https://issues.apache.org/jira/projects/ANY23 Thank you
On 2018/11/30 14:21:30, govind nitk <[email protected]> wrote: > one observation with cli tool: > *any23 2.2 * > *./bin/any23 rover "https://www.bbc.com/sport/football/46377603 > <https://www.bbc.com/sport/football/46377603>" -o /tmp/any23_2.2* > ------------------------------------------------------------------------ > Apache Any23 :: rover > ------------------------------------------------------------------------ > > Nov 30, 2018 7:45:32 PM > org.apache.tika.config.InitializableProblemHandler$3 > handleInitializableProblem > WARNING: JBIG2ImageReader not loaded. jbig2 files will be ignored > See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io > for optional dependencies. > TIFFImageWriter not loaded. tiff files will not be processed > See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io > for optional dependencies. > J2KImageReader not loaded. JPEG2000 files will not be processed. > See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io > for optional dependencies. > > Nov 30, 2018 7:45:32 PM > org.apache.tika.config.InitializableProblemHandler$3 > handleInitializableProblem > WARNING: org.xerial's sqlite-jdbc is not loaded. > Please provide the jar on your classpath to parse sqlite files. > See tika-parsers/pom.xml for the correct version. > 0 [main] INFO org.apache.any23.rdf.PopularPrefixes - Loading prefixes > from /org/apache/any23/prefixes/prefixes.properties > 1113 [main] INFO org.apache.any23.extractor.SingleDocumentExtraction - > Processing https://www.bbc.com/sport/football/46377603 > 3127 [main] INFO org.apache.any23.cli.Rover - Extractors used: > [html-head-meta, html-head-title, html-rdfa11] > 3127 [main] INFO org.apache.any23.cli.Rover - 55 triples, 3083ms > > ------------------------------------------------------------------------ > Apache Any23 SUCCESS > Total time: 4s > Finished at: Fri Nov 30 19:45:35 IST 2018 > Final Memory: 40M/143M > ------------------------------------------------------------------------ > > > > *with any23 2.3 snapshot cli released locally:* > */bin/any23 rover "https://www.bbc.com/sport/football/46377603 > <https://www.bbc.com/sport/football/46377603>" -o /tmp/any23_2.3* > > 1 [main] ERROR org.apache.any23.writer.WriterFactoryRegistry - Found > error loading a WriterFactory > java.util.ServiceConfigurationError: org.apache.any23.writer.WriterFactory: > Provider org.apache.any23.cli.flows.PeopleExtractorFactory not found > at java.util.ServiceLoader.fail(ServiceLoader.java:239) > at java.util.ServiceLoader.access$300(ServiceLoader.java:185) > at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:372) > at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404) > at java.util.ServiceLoader$1.next(ServiceLoader.java:480) > at > org.apache.any23.writer.WriterFactoryRegistry.<init>(WriterFactoryRegistry.java:90) > at > org.apache.any23.writer.WriterFactoryRegistry$InstanceHolder.<clinit>(WriterFactoryRegistry.java:54) > at > org.apache.any23.writer.WriterFactoryRegistry.getInstance(WriterFactoryRegistry.java:129) > at org.apache.any23.cli.Rover.<clinit>(Rover.java:76) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at java.lang.Class.newInstance(Class.java:442) > at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380) > at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404) > at java.util.ServiceLoader$1.next(ServiceLoader.java:480) > at org.apache.any23.cli.ToolRunner.execute(ToolRunner.java:95) > at org.apache.any23.cli.ToolRunner.execute(ToolRunner.java:72) > at org.apache.any23.cli.ToolRunner.main(ToolRunner.java:68) > > ------------------------------------------------------------------------ > Apache Any23 :: rover > ------------------------------------------------------------------------ > > 2244 [main] WARN org.apache.http.client.protocol.ResponseProcessCookies - > Invalid cookie header: "Set-Cookie: > BBC-UID=ca727e6c3a3b33f842e8878f6fafd0e83567ff24f7978b58d536e1eec83ce2590Any23-CLI; > expires=Tue, 29 Nov 2022 14:15:41 GMT; path=/; domain=.bbc.com". Invalid > 'expires' attribute: Tue, 29 Nov 2022 14:15:41 GMT > 4384 [main] INFO org.apache.any23.cli.Rover - Extractors used: > [html-head-meta, html-scraper, html-head-title, html-rdfa11] > 4384 [main] INFO org.apache.any23.cli.Rover - 59 triples, 2568ms > > ------------------------------------------------------------------------ > Apache Any23 SUCCESS > Total time: 4s > Finished at: Fri Nov 30 19:45:43 IST 2018 > Final Memory: 75M/187M > ------------------------------------------------------------------------ > > > with snapshot released locally, it starts with > *[main] ERROR org.apache.any23.writer.WriterFactoryRegistry - Found error > loading a WriterFactory* > > > > > On Thu, Nov 29, 2018 at 11:56 PM lewis john mcgibbney <[email protected]> > wrote: > > > Hi dev@, > > Is there anything else we want to include in the 2.3 development drive or > > can we go ahead and produce a release candidate? > > Thanks > > Lewis > > > > -- > > http://home.apache.org/~lewismc/ > > http://people.apache.org/keys/committer/lewismc > > >
