This is an automated email from the ASF dual-hosted git repository.
lewismc pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tika.git.
from 0f942ec Merge pull request #434 from lewismc/TIKA-3383
new b97ae8a Created basic structure
new 62783b2 pom build fix
new bf4f1fe Copied function definitions over from translate and changed
class variable types
new a608dfa validSourceLanguages for StartStreamTranscription added
new e8e0a06 validSourceLanguages for StartStreamTranscription added
new 04cff77 Check that input sourceLanguage is valid for Amazon's
StartStreamTranscription
new c60c937 Added AmazonTranscribeGuessLanguageTest
new 7ddbff2 Fixing Merge Conflicts
new e01d4a9 overwrote Interface
new 58461cf Merge branch 'TIKA-94' of https://github.com/rohan2810/tika
into TIKA-94
new 5321b69 changed exception strnig throw
new 068120f reduced transcribe runtime by implementing HashSet
new 205faa1 Changed variable reference to method call
new ffbaad5 Added javadoc desription
new 2b82324 Instantiated bucketname, clientID, and secret in contructor
new 931b00f amazon dependencies added to header
new 0160246 added amazonaws dependency
new e3e94c2 Merge branch 'TIKA-94' of https://github.com/rohan2810/tika
into TIKA-94
new 65acd7e Completed AWS audio transcribe. Lewis can you review?
new e025d66 Merge
new ced2ba2 Adding AmazonTranscribeGuessLaunguageTest
new 0600a4b Updating AmazonTranscribeTest
new d98aebf > removed aws from core
new c53f73c Package Name refactoring, more generic interface, changes in
implementation
new 04ce661 Merge branch 'TIKA-94' of https://github.com/rohan2810/tika
into TIKA-94
new 0333578 uploadFileToBucket is now a private method. key -> jobName
TODO add documentation for the methods
new 79030a8 Merge branch 'TIKA-94' of https://github.com/rohan2810/tika
into TIKA-94
new aa76fc7 Updated AmazonTranscribeGuessLanguageTest to mesh with
AmazonTranscribe interface
new 60d131d Pushed changes to interface and make it AWS independent
new ec5de38 Changed jobName from filename to auto generated UUID
new dfd21f3 should not be creating new jobname in Upload file to bucket
new a2c2c61 Updated AmazonTranscribeGuessLanguageTest to call
getTranscriptResult
new 17f2d10 fix for TIKA-94 contributed by phantuanminh: Rename package
(Fix typo). Add simple test and test files
new 58784b7 Resolved Merge Conflicts with other test files
new 6921ffb Added usage of test recourse files in
AmazonTranscribeGuessLanguageTest
new ad6ac1b added documentation. Made the interface AWS independent some
other small fixes.
new 598edd6 no need for support of overhead conversion of mp4 to mp3
new 6e07c57 Revert "no need for support of overhead conversion of mp4 to
mp3"
new cae79ac few fixes based off comments
new 3688d2d Added de-DE, en-AU, en-GB, en-US, it-IT, ja-JP, ko-KR, pt-BR
audio samples to test resources
new 0220d57 Added tests for new audio recourse files
new c7763b7 Added documentation and mp4 file tests to
AmazonTranscribeGuessLanguageTest
new b7221e4 remove video
new 7805990 remove video
new 84f7de6 startTranscribe-> transcribe
new 818be72 test refactoring
new 927ce49 dependency fix
new dc2f979 white spaces and fixes
new fb1a86f imports fix
new 600d972 added some comments
new b5be3be added comments
new 09ed087 Merge branch 'TIKA-94' of https://github.com/rohan2810/tika
into TIKA-94
new 9d2c986 Changed startTranscribe() to transcribe()
new a568b13 comments
new e8ff342 String -> InputStream
new b2bcfd3 String -> InputStream TODO:Testing -> AWS & new changes
testing.
new aba21b8 Merge remote-tracking branch 'origin/TIKA-94' into TIKA-94
new f27f864 Add documentation for tests, modify and merge test files
new f3e284d Merge branch 'TIKA-94' of https://github.com/rohan2810/tika
into TIKA-94
new f633e65 [TIKA-94] Speech-to-text transcription
new 2d0f9e2 Merge pull request #406 from rohan2810/TIKA-94
The 5223 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails. The revisions
listed as "add" were already present in the repository and have only
been added to this reference.
Summary of changes:
pom.xml | 1 +
.../org/apache/tika/transcribe/Transcriber.java | 60 +++
tika-transcribe/pom.xml | 150 ++++++
.../apache/tika/transcribe/AmazonTranscribe.java | 264 +++++++++++
.../org.apache.tika.language.translate.Translator | 2 +-
.../transcribe.amazon.properties | 5 +-
.../tika/transcribe/AmazonTranscribeTest.java | 527 +++++++++++++++++++++
.../src/test/resources/ShortAudioSampleFrench.mp3 | Bin 0 -> 25861 bytes
.../test/resources/de-DE_(We_Are_At_School_x2).mp3 | Bin 0 -> 38547 bytes
.../resources/en-AU_(A_Little_Bottle_Of_Water).mp3 | Bin 0 -> 33365 bytes
.../resources/en-GB_(A_Little_Bottle_Of_Water).mp3 | Bin 0 -> 35872 bytes
.../resources/en-US_(A_Little_Bottle_Of_Water).mp3 | Bin 0 -> 29603 bytes
tika-transcribe/src/test/resources/en-US_(Hi).mp4 | Bin 0 -> 21739 bytes
.../resources/it-IT_(We_Are_Having_Class_x2).mp3 | Bin 0 -> 42219 bytes
.../test/resources/ja-JP_(We_Are_At_School).mp3 | Bin 0 -> 21699 bytes
.../src/test/resources/ko-KR_(Annyeonghaseyo).mp4 | Bin 0 -> 144151 bytes
.../resources/ko-KR_(We_Are_Having_Class_x2).mp3 | Bin 0 -> 66843 bytes
.../test/resources/pt-BR_(We_Are_At_School).mp3 | Bin 0 -> 29043 bytes
18 files changed, 1006 insertions(+), 3 deletions(-)
create mode 100644
tika-core/src/main/java/org/apache/tika/transcribe/Transcriber.java
create mode 100644 tika-transcribe/pom.xml
create mode 100644
tika-transcribe/src/main/java/org/apache/tika/transcribe/AmazonTranscribe.java
copy
tika-core/src/main/resources/META-INF/services/org.apache.tika.detect.Detector
=>
tika-transcribe/src/main/resources/META-INF.services/org.apache.tika.language.translate.Translator
(93%)
copy
tika-parsers/tika-parsers-advanced/tika-parser-nlp-module/src/test/resources/org/apache/tika/parser/ner/regex/ner-regex.txt
=>
tika-transcribe/src/main/resources/org.apache.tika.transcribe/transcribe.amazon.properties
(88%)
create mode 100644
tika-transcribe/src/test/java/org/apache/tika/transcribe/AmazonTranscribeTest.java
create mode 100644
tika-transcribe/src/test/resources/ShortAudioSampleFrench.mp3
create mode 100644
tika-transcribe/src/test/resources/de-DE_(We_Are_At_School_x2).mp3
create mode 100644
tika-transcribe/src/test/resources/en-AU_(A_Little_Bottle_Of_Water).mp3
create mode 100644
tika-transcribe/src/test/resources/en-GB_(A_Little_Bottle_Of_Water).mp3
create mode 100644
tika-transcribe/src/test/resources/en-US_(A_Little_Bottle_Of_Water).mp3
create mode 100644 tika-transcribe/src/test/resources/en-US_(Hi).mp4
create mode 100644
tika-transcribe/src/test/resources/it-IT_(We_Are_Having_Class_x2).mp3
create mode 100644
tika-transcribe/src/test/resources/ja-JP_(We_Are_At_School).mp3
create mode 100644
tika-transcribe/src/test/resources/ko-KR_(Annyeonghaseyo).mp4
create mode 100644
tika-transcribe/src/test/resources/ko-KR_(We_Are_Having_Class_x2).mp3
create mode 100644
tika-transcribe/src/test/resources/pt-BR_(We_Are_At_School).mp3