On Tue, 4 Feb 2014 07:34:24 +0100 Liviu Andronic <[email protected]> wrote:
> On Tue, Feb 4, 2014 at 2:38 AM, Steve Litt > <[email protected]> wrote: > > I just created a brand new LyX file in 2.0.6, which is the packaged > > LyX for Ubuntu 13.10, and there wasn't a bit of XML in it, well > > formed or otherwise. It was basically the same format human > > parseable format it's been for 10 years, but with a lot more insets > > and options. But as far > > > Quick question: What are your reasons / needs for being able to > "humanly" parse the file format? Would a tool like pLyX address these > needs: http://wiki.lyx.org/Examples/FindAndReplaceLyXFormatElements ? > > Liviu Thanks for asking, Liviu. Let me answer your second question first: From the URL you gave, I can't determine how much or how little pLyX would help me. I can't even tell whether pLyX is runnable as a command: a must for putting it into shellscripts. But either way, it's not a substitute for parsability... Which brings us to your first question: Why do I need a parsable native format? The best answer I can give you is "general principle". Over many years with many different programs on many different operating systems, every time I had one of those programs emitting "magic format" native formats, eventually I wished I could parse. I have tons of files in Micrografx Windows Draw format: best vector graphics program ever. But it doesn't work in Linux, and in fact the latest version came out last century. If I could parse it, I could convert it to SVG and continue with Inkscape. But Noooooo, it's a trade secret binary. I had a Clarion (Rapid Application Development) app whose .app file suffered a little bit of corruption. It would have been pretty easy to tweak it back to life with a text editor, but it was a binary file. I had to drop back to a three day old backup: I lost three days of work. I needed an automated tweak of an OpenOffice doc, so I could watermark it before sending it to customers. But OpenOffice/LibreOffice docs are groups of XML files that, if they were a database, would be considered horribly denormalized. For all practical purposes, OpenOffice format might as well be considered secret binary: I had to do it in (gulp), MS Powerpoint binary. A few weeks ago I upgraded my daily driver computer to Ubuntu 13.10, and a few of my Gnumeric files wouldn't open in Gnumeric. The file format was very complex XML I couldn't understand. I ended up taking the file to a computer with a different Gnumeric version, exporting to MSExcel, and then importing that in my daily driver's Gnumeric. If the XML had been easy enough to understand, I could have used the old "repeatedly remove half, see if the symptom went away" technique. But with XML, especially the kind where a single fact is represented in several branches and they all must agree (denormalized), you're likely to bust it further, so it's not good for troubleshooting. Now let's look at the other side of the equation: Times I've had an accessible and parsable native format... VimOutliner's (VO) native format is tab-indented ascii. So far I've created VO to HTML, VO to Easy Menu Definition Language, VO to Troubleshooters.Com Linux Library web page, and many, many more. Others have created all sorts of extensions for VO, one guy made it into a calendaring program, and a lot of these guys weren't professional programmers. Xhtml is easily parsable with Python's lxml.etree parser. So I used it to convert a Bluefish-created Xhtml file into ePub. The hardest part of the job was understanding the ePub spec, complete with device idiosyncrasies. About 5 to 10 times in the 12 years I've used LyX, there have been LyX files that couldn't be opened in LyX. No problem, I edited them in Vim, did the old "repeatedly remove half, see if the symptom went away" technique, eventually finding that one factor that prevented opening. My (LyX created) instructor notes have a slide-by-slide explanation of the accompanying (Powerpoint) presentation. From time to time I change the slides or slide order, so I really don't want to hard-code numbers into the Instructor Notes: I want the slides to be numbered on the fly in the Instructor Notes. At the time I first made this, I didn't know enough about LaTeX counters and LaTeX commands (and LyX didn't have char styles with which to implement the commands anyway) to do it that way. So instead, I built a preprocessor that went through my file, and every time it found blue text, it assigned an incremented number within LyX itself. Most of my books are written with LyX. With eBooks, I hate DRM but want my recipients to understand it's not cool to unauthorizedly copy my books, because I feed my family by selling those books. So I personalize every book with the person's name: The footer says "This book created expressly for John Q Public" or whatever the guy's name is. Doing this was trivial: The LyX file has a command called Licensee, and I have a script copies the LyX file to a temporary, changes that variable to the buyer's name, and compiles the book to PDF. One of my books: "Troubleshooting Techniques of the Successful Technologist", is available both as a print book and as a PDF eBook. By parsing and changing a few commands in the preamble, I can do either, while maintaining only one source file. The bottom line is this: With a parsable native format, you know for sure you'll never get painted into a corner. That's a *powerful* feeling of security. Thanks, SteveT Steve Litt * http://www.troubleshooters.com/ Troubleshooting Training * Human Performance
