[
https://issues.apache.org/jira/browse/NUTCH-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18064226#comment-18064226
]
ASF GitHub Bot commented on NUTCH-1446:
---------------------------------------
lewismc commented on code in PR #905:
URL: https://github.com/apache/nutch/pull/905#discussion_r2907415018
##########
src/java/org/apache/nutch/indexer/IndexerOutputFormat.java:
##########
@@ -40,32 +33,67 @@ public RecordWriter<Text, NutchIndexAction> getRecordWriter(
Configuration conf = context.getConfiguration();
final IndexWriters writers = IndexWriters.get(conf);
- String name = getUniqueFile(context, "part", "");
- writers.open(conf, name);
+ // open writers (no temporary file output anymore)
+ writers.open(conf, "index");
LOG.info(writers.describe());
return new RecordWriter<Text, NutchIndexAction>() {
@Override
public void close(TaskAttemptContext context) throws IOException {
- // do the commits once and for all the reducers in one go
- boolean noCommit = conf
- .getBoolean(IndexerMapReduce.INDEXER_NO_COMMIT, false);
+
+ boolean noCommit =
+ conf.getBoolean(IndexerMapReduce.INDEXER_NO_COMMIT, false);
+
if (!noCommit) {
writers.commit();
}
+
writers.close();
}
@Override
public void write(Text key, NutchIndexAction indexAction)
throws IOException {
+
if (indexAction.action == NutchIndexAction.ADD) {
writers.write(indexAction.doc);
+
} else if (indexAction.action == NutchIndexAction.DELETE) {
writers.delete(key.toString());
}
}
};
}
-}
+
+ @Override
+ public void checkOutputSpecs(JobContext context)
+ throws IOException, InterruptedException {
+ // No output specs required since we don't write files
+ }
+
+ @Override
+ public OutputCommitter getOutputCommitter(TaskAttemptContext context)
+ throws IOException, InterruptedException {
+
+ return new OutputCommitter() {
+
+ @Override
+ public void setupJob(JobContext jobContext) {}
+
+ @Override
+ public void setupTask(TaskAttemptContext taskContext) {}
+
+ @Override
+ public boolean needsTaskCommit(TaskAttemptContext taskContext) {
+ return false;
+ }
+
+ @Override
+ public void commitTask(TaskAttemptContext taskContext) {}
+
+ @Override
+ public void abortTask(TaskAttemptContext taskContext) {}
+ };
Review Comment:
@shishir-kuet can you address this issue? Thank you.
> Port NUTCH-1444 to trunk (Indexing should not create temporary files)
> ---------------------------------------------------------------------
>
> Key: NUTCH-1446
> URL: https://issues.apache.org/jira/browse/NUTCH-1446
> Project: Nutch
> Issue Type: Bug
> Reporter: Ferdy
> Priority: Major
> Fix For: 1.23
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)