[ 
https://issues.apache.org/jira/browse/LUCENE-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-2958:
--------------------------------

    Attachment: LUCENE-2958.patch

Hi, thanks Mike and Shai for the review and great comments.

Attaching an updated patch.

Now WriteLineDocTask writes the fields as a header line to the result file. 

It always does this - perhaps a property to disable the header will be useful 
for allowing previous behavior (no header).

There are quite a few involved changes to LineDocSource:

- replaced line.split(SEP) by original recurring search for SEP.
- Method fillDocData(doc,fields[]) was changed to take a line String instead of 
the array of fields.
- That method was wrapped in a new interface: DocDataFiller for which there are 
now two implementations: 
-- SimpleDocDataFiller is used when there is no header line in the input file. 
It is implementing the original logic before this change. This allows to 
continue using existing line-doc-files which have no header line.
-- HeaderDocDataFiller is used when there exists a header line in the input 
file. Its implementation populates both fixed fields and flexible properties of 
DocData:
--- At construction of the filler a mapping is created from the field position 
in the header line to a setter method of docData. That mapping is not by 
reflection, nor by a HashMap - simply an int[] posToM where if posToM[3] = 1, 
later, when handling the field no. 3 in the line, the method fillDate3() will 
be called, and it will, in turn, call docData.setDate(), through a switch 
statement. If there's no mapping to a DocData setter, its properties object 
will be populated. So, this is quite general, with some performance overhead, 
though less than reflection I think (I did not measure this).
- An extension point for overriding the filler creation is through two new 
methods:
-- createDocDataFiller() for the case of no header line
-- createDocDataFiller(String[] header) when a header line is found in the input
- Note that filler creation is done once, when reading the first line of the 
input file. 

Some tests were fixed to account for the existence (or absence) of a header 
line.

I think more tests are required, but you can get the idea how this code will 
work.

Bottom line, LineDocSource is more general now, but the code became more 
complex.

I have mixed feelings about this - preferring simple code, but the added 
functionality is appealing.

> WriteLineDocTask improvements
> -----------------------------
>
>                 Key: LUCENE-2958
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2958
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/benchmark
>            Reporter: Doron Cohen
>            Assignee: Doron Cohen
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-2958.patch, LUCENE-2958.patch, LUCENE-2958.patch
>
>
> Make WriteLineDocTask and LineDocSource more flexible/extendable:
> * allow to emit lines also for empty docs (keep current behavior as default)
> * allow more/less/other fields

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to