[ https://issues.apache.org/jira/browse/LUCENE-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Doron Cohen updated LUCENE-2958: -------------------------------- Attachment: LUCENE-2958.patch Hi, thanks Mike and Shai for the review and great comments. Attaching an updated patch. Now WriteLineDocTask writes the fields as a header line to the result file. It always does this - perhaps a property to disable the header will be useful for allowing previous behavior (no header). There are quite a few involved changes to LineDocSource: - replaced line.split(SEP) by original recurring search for SEP. - Method fillDocData(doc,fields[]) was changed to take a line String instead of the array of fields. - That method was wrapped in a new interface: DocDataFiller for which there are now two implementations: -- SimpleDocDataFiller is used when there is no header line in the input file. It is implementing the original logic before this change. This allows to continue using existing line-doc-files which have no header line. -- HeaderDocDataFiller is used when there exists a header line in the input file. Its implementation populates both fixed fields and flexible properties of DocData: --- At construction of the filler a mapping is created from the field position in the header line to a setter method of docData. That mapping is not by reflection, nor by a HashMap - simply an int[] posToM where if posToM[3] = 1, later, when handling the field no. 3 in the line, the method fillDate3() will be called, and it will, in turn, call docData.setDate(), through a switch statement. If there's no mapping to a DocData setter, its properties object will be populated. So, this is quite general, with some performance overhead, though less than reflection I think (I did not measure this). - An extension point for overriding the filler creation is through two new methods: -- createDocDataFiller() for the case of no header line -- createDocDataFiller(String[] header) when a header line is found in the input - Note that filler creation is done once, when reading the first line of the input file. Some tests were fixed to account for the existence (or absence) of a header line. I think more tests are required, but you can get the idea how this code will work. Bottom line, LineDocSource is more general now, but the code became more complex. I have mixed feelings about this - preferring simple code, but the added functionality is appealing. > WriteLineDocTask improvements > ----------------------------- > > Key: LUCENE-2958 > URL: https://issues.apache.org/jira/browse/LUCENE-2958 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/benchmark > Reporter: Doron Cohen > Assignee: Doron Cohen > Priority: Minor > Fix For: 3.2, 4.0 > > Attachments: LUCENE-2958.patch, LUCENE-2958.patch, LUCENE-2958.patch > > > Make WriteLineDocTask and LineDocSource more flexible/extendable: > * allow to emit lines also for empty docs (keep current behavior as default) > * allow more/less/other fields -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org