This is an automated email from the ASF dual-hosted git repository.
totaro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/tika.git.
from d1a8bff TIKA-2459 -- fix special character handling
add e763021 Improvement for TIKA-2449 contributed by Giuseppe Totaro
add 7b869c0 Added a regular expression to match standard word within a
pattern for TIKA-2449 contributed by Giuseppe Totaro
add 31625a2 Used the alphabetical order for the list of the standard
organizations by relying on TreeMap. Thanks to Lewis McGibbney for this
insightful suggestion (TIKA-2449 contributed by Giuseppe Totaro).
new 7dd38d5 Merge branch 'master' of https://github.com/apache/tika
new db89ab3 TIKA-2449: Enabling extraction of standard references from
text
The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails. The revisions
listed as "add" were already present in the repository and have only
been added to this reference.
Summary of changes:
CHANGES.txt | 2 +
.../org/apache/tika/sax/StandardOrganizations.java | 166 ++++++++++++++++++
.../org/apache/tika/sax/StandardReference.java | 124 ++++++++++++++
.../sax/StandardsExtractingContentHandler.java | 116 +++++++++++++
.../java/org/apache/tika/sax/StandardsText.java | 188 +++++++++++++++++++++
.../tika/example/StandardsExtractionExample.java | 109 ++++++++++++
.../sax/StandardsExtractingContentHandlerTest.java | 55 ++++++
.../test-documents/testStandardsExtractor.pdf | Bin 0 -> 143659 bytes
8 files changed, 760 insertions(+)
create mode 100644
tika-core/src/main/java/org/apache/tika/sax/StandardOrganizations.java
create mode 100644
tika-core/src/main/java/org/apache/tika/sax/StandardReference.java
create mode 100644
tika-core/src/main/java/org/apache/tika/sax/StandardsExtractingContentHandler.java
create mode 100644
tika-core/src/main/java/org/apache/tika/sax/StandardsText.java
create mode 100644
tika-example/src/main/java/org/apache/tika/example/StandardsExtractionExample.java
create mode 100644
tika-parsers/src/test/java/org/apache/tika/sax/StandardsExtractingContentHandlerTest.java
create mode 100644
tika-parsers/src/test/resources/test-documents/testStandardsExtractor.pdf
--
To stop receiving notification emails like this one, please contact
['"[email protected]" <[email protected]>'].