Now that you mention it... http://linux.slashdot.org/story/13/07/15/2316219/kernel-dev-tells-linus-torvalds-to-stop-using-abusive-language
Micru On Wed, Jul 17, 2013 at 11:36 AM, Brion Vibber <[email protected]> wrote: > I'm not sure his attitude will encourage people to work with him to his > specifications. > > -- brion > > > > > On Wed, Jul 17, 2013 at 8:12 AM, David Cuenca <[email protected]> wrote: > > > I'm forwarding this message by George Orwell III on en-ws [1]. I think it > > is extremely important as it offers an insight about what is wrong with > > Djvu handling on Wikisource. > > > > > > "We/you are losing the X-min, Y-min, X-Max & Y-max (mapping coordinates) > > because the original PHP contributing a-hole for the DjVu routine on our > > servers never bothered to finish the part where the internal DjVu text > > layer is converted to a (coordinate rich) XML file using the existing > > DjVuLibre software package because, at the time, the software had issues. > > > > "That faulty DjVuLibre version was the equivalent of 4,317 versions ago > and > > the issue has been long fixed now EXCEPT that the .DTD file needed to > base > > the plain-text to XML conversion on still has the wrong 'folder path' on > > local DjVuLibre installs (if this is true on server installs as well, I > > cannot say for sure). Once I copied the folder to the [wrong] folder > path, > > I was able to generate the XMLs all day long. These XMLs are just like > the > > ones IA generates during their process (in addition to the XML that AABBY > > generates for them). > > > > "So its not that we as a community decided not to follow through with > > (coordinate rich) XML generation but got stuck with the plain-text dump > > workaround due to a DjVuLibre problem that no longer exists. Plus, the > guy > > who created the beginnings of this fabulous disaster was like tick with > an > > attention span deficit and moved on to conjuring up some other blasted > > thing or another instead of following up on his own workaround & finish > the > > XML coding portion once DjVuLibre glitch was fixed. -- 15:16, 15 July > 2013 > > (UTC) > > > > > > [1] > > > > > http://en.wikisource.org/wiki/Wikisource:Scriptorium#EPUB.2FHTML_to_Wikitext > > > > On Wed, Jul 17, 2013 at 6:57 AM, Alex Brollo <[email protected]> > > wrote: > > > > > Just a brief comment about djvu text layer, using IA files to digging > > > deeper the topic. > > > > > > FineReader OCR stores an incredibly detailed information in a > proprietary > > > format; then, various FineReader versions export something of this > > > extremely rich set of information into different outputs - one of them > > > being djvu text layer. It's worth to note that even if any information > > > stored into djvu text layer can be extracted and used, the set of > > > information wrapped into djvu text layer (both in lisp-like format or > in > > > xml format) is only a minor subset of original OCR information. > > > > > > If someone is interested to get much more information, it can find it > > into > > > abbyy.xml output; and Internet Archive gives it as abbyy.gz into the > list > > > of exportable files. It's a very heavy and complex xml structure but it > > is > > > possible to parse it, end to extract from it any information wrapped > into > > > djvu text layer and much more - most interestingly, wortPenalty, that > is, > > > word by word, the resume of degree of incertainty of OCR recognition of > > the > > > whole word. > > > > > > We (I and Aarti) are digging into this mess, with fast preliminary > > > results; you can see into [[it:w:Utente:Alex brollo/Sandbox]] some > brief > > > pieces of text extracted from abbyy.gx, where doubtful words (in the > > > opinion of OCR software) are red. They can be easily managed by > > > VisualEditor - caming simply from a simple span tag. > > > > > > Now, I'm waiting dor Aarti work; as soon a VisualEditor for nsPage will > > > run, it would be possible to extract text by bot from abbyy.gz (if the > > work > > > comes from IA) and to upload such text as OCR. > > > > > > Alex > > > > > > > > > > > > 2013/7/16 David Cuenca <[email protected]> > > > > > >> Hi Aubrey, > > >> Thanks for the heads-up, I have CC'ed Sébastien from fr-ws, he worked > on > > >> the djvu text extraction/merging and he was interested in following-up > > on > > >> that. Maybe he has some fresh ideas about it. > > >> > > >> Micru > > >> > > >> On Tue, Jul 16, 2013 at 10:24 AM, Andrea Zanni < > > [email protected]>wrote: > > >> > > >>> Hi David, Aarti, thibaud and Tpt, > > >>> please look at this thread: > > >>> > > >>> > > > http://en.wikisource.org/wiki/Wikisource:Scriptorium#EPUB.2FHTML_to_Wikitext > > >>> especially the last message. > > >>> > > >>> It seems George Orwell III knows his stuff about Djvu and Proofread > > >>> extension, > > >>> and it's probably worth digging into this "layer text" djvu thing. > > >>> > > >>> Even if I might dream of an ideal solution (a "layered structure" for > > >>> wikisource, in which text can marked up several times in different > > layers) > > >>> that is probably very far away. > > >>> > > >>> But it's still important to pave the way for further improvements, I > > >>> guess: > > >>> losing all the information from a formatted, mapped IA djvu it's not > a > > >>> good thing to do, IMHO. > > >>> And the Visual Editor could help us, in the future, to keep some of > > that > > >>> information (italics, bold, etc.) > > >>> > > >>> I know Aarti spoke with Alex about abbyy.xml: is it possible to do > > >>> something with it? > > >>> > > >>> Aubrey > > >>> > > >> > > >> > > >> > > >> -- > > >> Etiamsi omnes, ego non > > >> _______________________________________________ > > >> Wikisource-l mailing list > > >> [email protected] > > >> https://lists.wikimedia.org/mailman/listinfo/wikisource-l > > >> > > >> > > > > > > _______________________________________________ > > > Wikisource-l mailing list > > > [email protected] > > > https://lists.wikimedia.org/mailman/listinfo/wikisource-l > > > > > > > > > > > > -- > > Etiamsi omnes, ego non > > _______________________________________________ > > MediaWiki-l mailing list > > [email protected] > > https://lists.wikimedia.org/mailman/listinfo/mediawiki-l > > > _______________________________________________ > MediaWiki-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/mediawiki-l > -- Etiamsi omnes, ego non _______________________________________________ MediaWiki-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
