GitHub user jessehatfield opened a pull request:
https://github.com/apache/incubator-rya/pull/55
Consolidated MapReduce API and applications into toplevel project.
Fixes/functionality:
- Fixed hashCode/compareTo/equals methods in RyaStatementWritable, RyaType,
and RyaURI
- Minor tweaks to AccumuloHDFSFileInputFormat, which seemed to be broken
with Accumulo 1.6 but should now work
- Made RdfFileInputFormat threaded, allowing it to handle all RDF input
that was previously handled by various tools
- Added entity-centric indexing to RyaOutputFormat
- Added methods for using Rya input and output formats to
AbstractAccumuloMRTool (renamed setupInputFormat to setupAccumuloInput since
there are now multiple input options)
Organization/documentation/consistency:
- Minor renaming and repackaging (standalone tools go in "tools",
"fileinput" and "utils" packages removed and classes moved one level up,
RyaStatementInputFormat->RyaInputFormat)
- Removed StatementWritable, changed to RyaStatementWritable where
applicable (for consistency with other code -- RyaStatementWritable can hold
more metadata, so it was the one to keep)
- Documented code and added a MapReduce page to the manual
- Added "examples" package with one example
Removing redundancy:
- Removed *NullIndexer classes, which were only used in RyaOutputFormat
(now, disabling an indexer simply sets it to an actual null value, and it
checks for this before storing statements)
- Removed RyaStatementMapper and RyaStatementReducer: these simply insert
records into Rya.
This functionality can be achieved with RyaOutputFormat and the default
mapper/reducer,
so these two classes seem redundant.
- Removed BulkNtripsInputTool, BulkNtripsInputToolIndexing,
RdfFileInputByLineTool,
and RyaBatchWriterInputTool: Fixes to RdfFileInputFormat now allow
RdfFileInputTool
to now handle all the file input use cases (configurable secondary
indexers, handles
any format, scales), rendering the other file import tools redundant.
(Previously, all
five tools had largely overlapping but subtly different behavior.)
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jessehatfield/incubator-rya
RYA-76-mapreduce-organization
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-rya/pull/55.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #55
----
commit 18552132e8bfecec33b57aae302e25d93c28e7e3
Author: Jesse Hatfield <[email protected]>
Date: 2016-06-30T19:50:08Z
Consolidated MapReduce API and applications into toplevel project.
Changes include:
- Made RdfFileInputFormat threaded, allowing it to handle all RDF input
- Added entity-centric indexing to RyaOutputFormat
- Added methods for using Rya input and output formats to
AbstractAccumuloMRTool (renamed setupInputFormat to setupAccumuloInput since
there are now multiple input options)
- Removed *NullIndexer classes, which were only used in RyaOutputFormat
- Removed StatementWritable, changed to RyaStatementWritable where
applicable (for consistency)
- Fixed hashCode/compareTo/equals methods in RyaStatementWritable, RyaType,
and RyaURI
- Minor renaming and repackaging (standalone tools go in "tools",
"fileinput" and "utils" removed)
- Minor tweaks to AccumuloHDFSFileInputFormat, which seemed to be broken
with Accumulo 1.6
- Documented code and added a MapReduce page to the manual
- Added "examples" package with one example
- Removed RyaStatementMapper and RyaStatementReducer: these simply insert
records into Rya.
This functionality can be achieved with RyaOutputFormat and the default
mapper/reducer,
so these two classes seem redundant.
- Removed BulkNtripsInputTool, BulkNtripsInputToolIndexing,
RdfFileInputByLineTool,
and RyaBatchWriterInputTool: Fixes to RdfFileInputFormat now allow
RdfFileInputTool
to now handle all the file input use cases (configurable secondary
indexers, handles
any format, scales), rendering the other file import tools redundant.
(Previously, all
five tools had largely overlapping but subtly different behavior.)
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---