Repository: any23 Updated Branches: refs/heads/master e6e136fc1 -> 4a302311d
updating files for release Project: http://git-wip-us.apache.org/repos/asf/any23/repo Commit: http://git-wip-us.apache.org/repos/asf/any23/commit/4a302311 Tree: http://git-wip-us.apache.org/repos/asf/any23/tree/4a302311 Diff: http://git-wip-us.apache.org/repos/asf/any23/diff/4a302311 Branch: refs/heads/master Commit: 4a302311d591314323551b2c038dc028b9f313fa Parents: e6e136f Author: Lewis John McGibbney <[email protected]> Authored: Fri Feb 3 20:38:43 2017 -0800 Committer: Lewis John McGibbney <[email protected]> Committed: Fri Feb 3 20:38:43 2017 -0800 ---------------------------------------------------------------------- NOTICE.txt | 4 + README.txt | 165 ------------------- RELEASE-NOTES.txt | 58 +++++++ plugins/README.txt | 71 -------- .../src/main/assembly/NOTICE-with-deps.txt | 2 +- .../src/main/assembly/NOTICE-with-deps.txt | 2 +- .../src/main/assembly/NOTICE-with-deps.txt | 2 +- pom.xml | 2 +- .../main/assembly/NOTICE-server-embedded.txt | 2 +- service/src/main/assembly/NOTICE-with-deps.txt | 2 +- .../src/main/assembly/NOTICE-without-deps.txt | 2 +- 11 files changed, 69 insertions(+), 243 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/any23/blob/4a302311/NOTICE.txt ---------------------------------------------------------------------- diff --git a/NOTICE.txt b/NOTICE.txt index 6d3bcec..d24cec8 100644 --- a/NOTICE.txt +++ b/NOTICE.txt @@ -10,3 +10,7 @@ Foundation (http://jquery.org/) under an MIT license. This product includes software developed by Eclipse RDF4J (http://rdf4j.org/) under the Eclipse Distribution License v1.0. + +This product includes software developed by Andrey Somov +(https://bitbucket.org/asomov/snakeyaml) under the Apache License +v2.0 http://git-wip-us.apache.org/repos/asf/any23/blob/4a302311/README.txt ---------------------------------------------------------------------- diff --git a/README.txt b/README.txt deleted file mode 100644 index ea8ebc1..0000000 --- a/README.txt +++ /dev/null @@ -1,165 +0,0 @@ - - ::: :::: ::: ::: ::: :::::::: :::::::: - :+: :+: :+:+: :+: :+: :+: :+: :+: :+: :+: - +:+ +:+ :+:+:+ +:+ +:+ +:+ +:+ +:+ - +#++:++#++: +#+ +:+ +#+ +#++: +#+ +#++: - +#+ +#+ +#+ +#+#+# +#+ +#+ +#+ - #+# #+# #+# #+#+# #+# #+# #+# #+# -### ### ### #### ### ########## ######## - -Apache Anything To Triples (Any23) is a library and web service that extracts -structured data in RDF format from a variety of Web documents. -Any23 documentation can be found on the [website](http://any23.apache.org) - -# Distribution Content - -api Any23 library external API. -core The library core codebase. -csvutils A CSV specific package -encoding Encoding detection library. -mime MIME Type detection library. -nquads NQuads parsing and serialization library. -plugins Library plugins codebase (read plugins/README.txt for further details). -service The library HTTP service codebase. -src Packing of Any23 artifacts. -test-resources Material relating to Any23 JUnit test cases. -RELEASE-NOTES.txt File reporting main release notes for every version. -LICENSE.txt Applicable project license. -README.md This file. - -# Online Documentation - -For details on the command line tool and web interface, see: - http://any23.apache.org/getting-started.html - -For a guide to using Any23 as a library in your Java applications, see: - http://any23.apache.org/developers.html - -Javadocs is available here: - http://any23.apache.org/apidocs/ - -# Community - -You can reach our and connect with our community on our [mailing lists](http://any23.apache.org/mail-lists.html) - -# Build Any23 from Source Code - -The canonical Any23 source code lives at the [Apache Software Foundation Git repository](https://git-wip-us.apache.org/repos/asf/any23.git). - -Be sure to have the [Apache Maven v.3.x+](http://maven.apache.org/) installed and included in $PATH. - -## Clone the source: -``` -git clone https://git-wip-us.apache.org/repos/asf/any23.git -``` -## Navigate and build: -``` -cd any23 -mvn clean install -`` -From now on any23 is refered to as $ANY23_HOME` -This will install the Any23 artifacts and its dependencies in your -local Maven3 repository. -You can then extract the compiled code and use the command line interface -Please note you will need to change the version to the tar or zip you are extracting. -``` -tar -zxvf $ANY23_HOME/core/target/apache-any23-core-${version-SNAPSHOT}.tar.gz -``` -# Run the Any23 Commandline Tools - -Any23 comes with some command line tools. Within the directory you just extracted, you can invoke: -Linux -``` -$ANY23_HOME/core/target/apache-any23-core-${version-SNAPSHOT}/bin/any23 # Provides the main Any23 use case: metadata extraction on a file or URL source. -``` -Windows -``` -$ANY23_HOME/core/target/apache-any23-core-${version-SNAPSHOT}/bin/any23.bat # Provides the main Any23 use case: metadata extraction on a file or URL source. -``` -The complete documentation about these tools can be found [here](http://any23.apache.org/getting-started.html) - -The bin scripts are generated dynamically during the package phase. -To ensure the package generation, from the top level directory run: -``` -mvn package -``` -You can void extracting the archive files by going to the core generated bin folder -``` -cd $ANY23_HOME/core/target/appassembler/bin/ -``` -and finally invoke the script for your OS (UNIX or Windows): - - bin$ ./any23 - [usage instructions will be printed out] - -# Run the Any23 Web Service - -Any23 can be run as a service. -To run the Any23 service go to the service dir -and then invoke the embedded Jetty server -``` -cd $ANY23_HOME/service -mvn jetty:run -``` -You can check the service is running by accessing [http://localhost:8080/](http://localhost:8080/) with your browser. - -The complete documentation about this service can be found [here](http://any23.apache.org/getting-started.html) - -# Build the Any23 Web Service WAR - -The Any23 Service WAR by default will be generated as self-contained, all the dependencies will be included as JAR within the WEB-INF/lib archive dir. - -To generate the self contained WAR invoke from the service dir: -``` -service$ mvn [-o] [-Dmaven.test.skip=true] clean package -``` -Where -o will build the process offline, and -Dmaven.test.skip=true -will force the test skipping. - -The WAR will be generated in -``` -$ANY23_HOME/service/target/any23-service-x.y.z-SNAPSHOT.war -``` -To produce a instead a WAR WITHOUT the included JAR dependencies it is possible to use -the war-without-deps profile: -``` -any23-service$ mvn [-o] [-Dmaven.test.skip=true] clean package -``` -The option [-o] will speed up the module build if you have already -collected all the required dependencies. - -The option [-Dmaven.test.skip=true] will disable tests. - -Again the various versions of the WAR will be generated into -``` -$ANY23_HOME/service/target/apache-any23-service-x.y.z-* -``` - -## Any23 Web Service Tracker Disclaimer - -The Any23 Web Service form (service/src/main/resources/form.html) contains a Google Analytics Tracker which is -by default configured to report to the Any23 Community. It is possible to change the user ID modifying the -```form.tracker.id``` property in parent POM. - -# Generate the Documentation - -To generate the project site locally execute the following command from $ANY23_HOME: -``` -cd $ANY23_HOME -MAVEN_OPTS='-Xmx1024m' mvn [-o] clean site:site -``` -You can speed up the site generation process specifying the offline option [-o], -but it works only if all the involved plugin dependencies has been already downloaded -in the local M2 repository. - -If you're interested in generating the Javadoc enriched with navigable UML graphs, you can activate -the umlgraphdoc profile. This profile relies on [graphviz](http://www.graphviz.org/) that must be -installed in your system. -``` -cd $ANY23_HOME -MAVEN_OPTS='-Xmx1024m' mvn -P umlgraphdoc clean site:site -``` - -# Munging of Any23 code to ASF - -When it was [decided](http://wiki.apache.org/incubator/Any23Proposal) that the Any23 code be brought into the Apache Incubator, the existing code was migrated over to the ASF infrastructure and documented/managed via a number of Jira tickets e.g, [INFRA-3978](https://issues.apache.org/jira/browse/INFRA-3978) [INFRA-4146](https://issues.apache.org/jira/browse/INFRA-4146) and [ANY23-29](https://issues.apache.org/jira/browse/ANY23-29). http://git-wip-us.apache.org/repos/asf/any23/blob/4a302311/RELEASE-NOTES.txt ---------------------------------------------------------------------- diff --git a/RELEASE-NOTES.txt b/RELEASE-NOTES.txt index edd2190..9b9477c 100644 --- a/RELEASE-NOTES.txt +++ b/RELEASE-NOTES.txt @@ -1,3 +1,61 @@ + Apache Any23 2.0 + Release Notes + 03/02/2017 (dd/mm/yyy) +Sub-task + + [ANY23-243] - Overhaul and update README.txt + +Bug + + [ANY23-79] - No execute permissions in command line tool + [ANY23-92] - NQuadsParser does not require whitespace between elements + [ANY23-99] - NQuadsWriter should force ASCII in OutputStream constructor + [ANY23-153] - Automatically Generate EARL reports for Any23 RDF Parsers + [ANY23-176] - DOC: Apache Any23 Installation Guide + [ANY23-200] - Build revision is not correctly defined + [ANY23-219] - rover is does not work with -f nquads option + [ANY23-235] - NQuads links broken on Supported Formats Page + [ANY23-236] - Port Any23 site to Apache CMS + [ANY23-248] - NTriplesWriter on hadoop : issue with MIME type/Upgrade sesame dependencies to 2.7.14 + [ANY23-252] - JSON-LD format MIME type is not detected + [ANY23-253] - JSON-LD cannot be processed by Rover + [ANY23-255] - apache-any23-quads dependency should not be <scope> test in core pom.xml + [ANY23-265] - ThreadSafety issue in ItemPropValue + [ANY23-272] - Service fails to start with any23server.bat + [ANY23-277] - Any23 master branch will not build to to build due to lacking maven-assembly-plugin + [ANY23-279] - Fix EmbeddedJSONLDExtractor ExtractorDescription getDescription() implementation + [ANY23-296] - Tar complains about groupid value being too big + [ANY23-302] - rover JSON output is not valid + +Improvement + + [ANY23-80] - Split out command line tools into a separate module + [ANY23-163] - VocabPrinter tool broken with No writer factory available for RDF format N-Quads (mimeTypes=text/x-nquads; ext=nq) + [ANY23-185] - Add missing <meta> element attributes to HTMLMetaExtractor + [ANY23-207] - Implement Microformats2 + [ANY23-246] - Add Open Graph Protocol and Facebook prefixes to popular.prefixes + [ANY23-247] - FIX Attribute name "itemscope" associated with an element type "html" must be followed by the ' = ' character. + [ANY23-250] - Upgrade to Tika 1.7 + [ANY23-261] - Tiny typo in Data Extraction documentation source example + [ANY23-263] - Upgrade to Tika 1.14 + [ANY23-274] - Change any23.microdata.ns.default configuration value to http://schema.org + [ANY23-276] - Upgrade sesame dependencies to RDF4J + [ANY23-278] - Upgrade all Maven plugin versions in parent pom.xml + [ANY23-293] - Package log4j configuration with core appassembler + [ANY23-297] - Any23 doesn't build under JDK1.8 + [ANY23-299] - Missing YAML to RDF parser + [ANY23-300] - Ignore NetBeans configuration files + +Task + + [ANY23-141] - Upgrade OpenRDF Sesame to 2.7.0 + [ANY23-242] - Address issues with 1.1 #1 RC + +Wish + + [ANY23-19] - Abstract away any specific RDF APIs + [ANY23-226] - Extract JSON-LD embedded in HTML + Apache Any23 1.1 Release Notes 15/10/2014 (dd/mm/yyyy) http://git-wip-us.apache.org/repos/asf/any23/blob/4a302311/plugins/README.txt ---------------------------------------------------------------------- diff --git a/plugins/README.txt b/plugins/README.txt deleted file mode 100644 index 11f2206..0000000 --- a/plugins/README.txt +++ /dev/null @@ -1,71 +0,0 @@ -============= -Any23 Plugins -============= - -This is the root dir of the Any23 Plugins module. - -A plugin is an extension of the Any23 core and can be plugged using -the Plugin Manager capabilities. - -Plugins -======= - -basic-crawler -------------- - -A CLI tool which extends the Rover CLI adding crawler specific -capabilities. - -html-scraper ------------- - -The HTML scraper is able to convert any HTML page to triples -containing the text scraped from the page. - -office-scraper --------------- - -The Office scraper is able to convert the main MS Office compatible -formats and convert them to triples. - -integration-test ----------------- - -This module contains the integration tests for all the defined plugins. - -Generate Plugin Packaging -========================= - -To generate the desired plugin package, navigate to the plugin directory and execute 'mvn package' -e.g. to generate the basic-crawler plugin package - -$cd $ANY23-HOME/plugins/basic-crawler -$ mvn package - -From the basic-crawler directory this generates - -. -|-- pom.xml -|-- src -| |-- main -| | |-- assembly -| | `-- java -| `-- test -`-- target - |-- any23-basic-crawler-${version}.jar - |-- apache-any23-basic-crawler-${version}-bin.tar.gz <<< - |-- apache-any23-basic-crawler-${version}-bin.zip <<< - |-- archive-tmp - |-- classes - | |-- META-INF - | `-- org - |-- generated-sources - |-- maven-archiver - |-- maven-shared-archive-resources - |-- surefire - |-- surefire-reports - `-- test-classes -... - -Plugin specific README's can be found in either ./target/*.tar.gz || ./target/*.zip (annotated above with '<<<'), where much more detailed information sources can be located. - http://git-wip-us.apache.org/repos/asf/any23/blob/4a302311/plugins/basic-crawler/src/main/assembly/NOTICE-with-deps.txt ---------------------------------------------------------------------- diff --git a/plugins/basic-crawler/src/main/assembly/NOTICE-with-deps.txt b/plugins/basic-crawler/src/main/assembly/NOTICE-with-deps.txt index 48b46c8..d28669a 100644 --- a/plugins/basic-crawler/src/main/assembly/NOTICE-with-deps.txt +++ b/plugins/basic-crawler/src/main/assembly/NOTICE-with-deps.txt @@ -1,5 +1,5 @@ Apache Any23 -Copyright 2011-2012 The Apache Software Foundation +Copyright 2011-2017 The Apache Software Foundation Copyright 2008-2011 Digital Enterprise Research Institute (DERI) This product includes software developed by http://git-wip-us.apache.org/repos/asf/any23/blob/4a302311/plugins/html-scraper/src/main/assembly/NOTICE-with-deps.txt ---------------------------------------------------------------------- diff --git a/plugins/html-scraper/src/main/assembly/NOTICE-with-deps.txt b/plugins/html-scraper/src/main/assembly/NOTICE-with-deps.txt index 886dfc6..104dba8 100644 --- a/plugins/html-scraper/src/main/assembly/NOTICE-with-deps.txt +++ b/plugins/html-scraper/src/main/assembly/NOTICE-with-deps.txt @@ -1,5 +1,5 @@ Apache Any23 -Copyright 2011-2012 The Apache Software Foundation +Copyright 2011-2017 The Apache Software Foundation Copyright 2008-2011 Digital Enterprise Research Institute (DERI) This product includes software developed by http://git-wip-us.apache.org/repos/asf/any23/blob/4a302311/plugins/office-scraper/src/main/assembly/NOTICE-with-deps.txt ---------------------------------------------------------------------- diff --git a/plugins/office-scraper/src/main/assembly/NOTICE-with-deps.txt b/plugins/office-scraper/src/main/assembly/NOTICE-with-deps.txt index 83d33d9..341fcb5 100644 --- a/plugins/office-scraper/src/main/assembly/NOTICE-with-deps.txt +++ b/plugins/office-scraper/src/main/assembly/NOTICE-with-deps.txt @@ -1,5 +1,5 @@ Apache Any23 -Copyright 2011-2012 The Apache Software Foundation +Copyright 2011-2017 The Apache Software Foundation Copyright 2008-2011 Digital Enterprise Research Institute (DERI) This product includes software developed by http://git-wip-us.apache.org/repos/asf/any23/blob/4a302311/pom.xml ---------------------------------------------------------------------- diff --git a/pom.xml b/pom.xml index a2f5d66..403904f 100644 --- a/pom.xml +++ b/pom.xml @@ -239,7 +239,7 @@ <slf4j.logger.version>1.7.21</slf4j.logger.version> <rdf4j.version>2.1.3</rdf4j.version> <semargl.version>0.7</semargl.version> - <latest.stable.released>1.1</latest.stable.released> + <latest.stable.released>2.0</latest.stable.released> <form.tracker.id>UA-59636188-1</form.tracker.id> <!-- Maven Plugin Versions --> http://git-wip-us.apache.org/repos/asf/any23/blob/4a302311/service/src/main/assembly/NOTICE-server-embedded.txt ---------------------------------------------------------------------- diff --git a/service/src/main/assembly/NOTICE-server-embedded.txt b/service/src/main/assembly/NOTICE-server-embedded.txt index 181e3bc..05e5b11 100644 --- a/service/src/main/assembly/NOTICE-server-embedded.txt +++ b/service/src/main/assembly/NOTICE-server-embedded.txt @@ -1,5 +1,5 @@ Apache Any23 -Copyright 2011-2012 The Apache Software Foundation +Copyright 2011-2017 The Apache Software Foundation Copyright 2008-2011 Digital Enterprise Research Institute (DERI) This product includes software developed by http://git-wip-us.apache.org/repos/asf/any23/blob/4a302311/service/src/main/assembly/NOTICE-with-deps.txt ---------------------------------------------------------------------- diff --git a/service/src/main/assembly/NOTICE-with-deps.txt b/service/src/main/assembly/NOTICE-with-deps.txt index 0a34e9d..84c9c2d 100644 --- a/service/src/main/assembly/NOTICE-with-deps.txt +++ b/service/src/main/assembly/NOTICE-with-deps.txt @@ -1,5 +1,5 @@ Apache Any23 -Copyright 2011-2012 The Apache Software Foundation +Copyright 2011-2017 The Apache Software Foundation Copyright 2008-2011 Digital Enterprise Research Institute (DERI) This product includes software developed by http://git-wip-us.apache.org/repos/asf/any23/blob/4a302311/service/src/main/assembly/NOTICE-without-deps.txt ---------------------------------------------------------------------- diff --git a/service/src/main/assembly/NOTICE-without-deps.txt b/service/src/main/assembly/NOTICE-without-deps.txt index d59b3de..d28093b 100644 --- a/service/src/main/assembly/NOTICE-without-deps.txt +++ b/service/src/main/assembly/NOTICE-without-deps.txt @@ -1,5 +1,5 @@ Apache Any23 -Copyright 2011-2012 The Apache Software Foundation +Copyright 2011-2017 The Apache Software Foundation Copyright 2008-2011 Digital Enterprise Research Institute (DERI) This product includes software developed by
