Greetings. I am afraid I owe you an apology -- I went to make some mods to tika for the app we are working on, and that got me into the code for text/plain translation to xhtml. For some reason -- I could have sworn it didn't work before -- I thought the translation of special characters wasn't being done, and I find out now that my examples work after all. Mea culpa.
The only good thing that came of this exercise was just that -- it was a good exercize to climb around the java hierarchy and get a feel for the way tika is organized, as well as for getting some practice with java, which is a new language for me. This leaves just one change to tika that I wonder about as it might be more appropriate to put it in the app itself rather than in tika. Our app will be an editor/transcriber tool for producing braille from print books or other files. The leader of this project wants newlines to be handled as follows: 2 consecutive newlines are to generate a <p> paragraph marker. In addition, he is concerned about the handling of carriage return newline and how they should affect the flow. I still need to pin him down on exactly what should happen. Anyway, this needs to be specified before I can do anything with it, but the problem does affect tika, if I use tika for text/plain files, since by the time the text gets to the user it will have already been rendered from the xhtml. I will be discussing this issue with the group and if I need to post again I will definitely try to be more prepared...:-/ Thanks for the comments. --le
