Re: [NTG-context] EPUB XHTML Format
On Thu, 5 Sep 2013 19:22:42 Aditya Mahajan adit...@umich.edu wrote: How easy is it to create a new export format. IIRC, context keeps track of the entire document tree, and flushes the XML output only at the end. Is it possible to make this pluggable so that users can write their own transformers (in lua) on how the document tree can be written. This will enable more output formats (opendocument and (shudder) latex). Or, (gasp!) MSword .docx Alan ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___
Re: [NTG-context] EPUB XHTML Format
On 9/6/2013 10:20 PM, Thangalin wrote: Hi, The best reader imho is iBooks on the iPad, nothing else, from what I've seen, comes close. But that is one expensive eReader. :( We'll just have everybody in the world who has a Kindle, Kobo, or other reader exchange their existing hardware, and then purchase an iPad plus iBook. Problem solved? ;-) ConTeXT TeX reading xml - export - optional transform - EPUB + CSS* you want 'direct epub html from context' (no xslt) but on the other hand use xslt to map onto context while context can do xml directly ... chicken egg Well, given that ConTeXt doesn't actually produce validating EPUB documents, I suspect not many people will actually use that feature. It's great in theory, but if it produces books that don't actually work on the Kindle or Kobo, then it's unusable in practice -- never mind not being able to add the books to online marketplaces (such as Amazon) because, again, the output does not validate. context doesn't produce epub (which at this moment is so floating that i would keep updating, which is fine if i'd use it myself or in projects at pragma, but not for the sake of keeping up) but does an export to xml (*.export) as a bonus it can output some extra stuff so that in a browser that can deal with xml+css (and a few xhtml tags for hyperlinks) we can preview then there is mtx-epub that can make an epub but that is a moving target (at some point we stopped extending waiting for a decent standard) so, i'd never claim that context produces epub but it can be used in a workflow that involves epub as it outputs xml which can be transformed supporting all variants of epub in the backend would be the same as hardcoding all kind of xml dts in the frontend (docbook, tei, whatever); instead we provide a general xml handler and a general xml export Hans - Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl - ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___
Re: [NTG-context] EPUB XHTML Format
Hi, so, i'd never claim that context produces epub but it can be used in a workflow that involves epub as it outputs xml which can be transformed That's a distinction that either might not matter or sometimes is lost: http://tex.stackexchange.com/a/17642/2148 http://wiki.contextgarden.net/epub ConTeXt has preliminary epub http://en.wikipedia.org/wiki/EPUBsupport... Does ConTeXt refer to a suite of tools, or only the context command? Either way, it appears that the line between the command and the tool set is blurred a bit. This is completely understandable, too, as you wouldn't want to write, the ConTeXt suite of tools includes a command, mtxrun, that can produce EPUB files all the time when talking about EPUBs. supporting all variants of epub in the backend would be the same as hardcoding all kind of xml dts in the frontend (docbook, tei, whatever); instead we provide a general xml handler and a general xml export That paragraph would be an excellent addition to the wiki; not sure where though. Kind regards. ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___
Re: [NTG-context] EPUB XHTML Format
Another small note, since I just walked down the ePUB path: you'll be very sad to find out that a lot of rendering engines for popular readers are not consistent, won't render standard XHTML markup correctly (nest an ordered list within an unordered list and then look at it in adobe digital editions and several other readers). But it is just XHML + CSS! you'll cry, How can they not render it correctly? I don't know, but it was an extremely frustrating process. I even contacted adobe to try and report this nested list bug to them... their suggestion was that I could *pay* them to work with content experts who would help me correct my source so that it would render correctly. The best reader imho is iBooks on the iPad, nothing else, from what I've seen, comes close. But that is one expensive eReader. :( On Thu, Sep 5, 2013 at 3:00 PM, Thangalin thanga...@gmail.com wrote: Hi, handle XML+CSS well. However, most (all?) EPUB readers don't. So, the question is asking if instead ConTeXt could generate a XHTML Precisely. If you need both EPUB and PDF, start with a semantically rich XML vocabulary, e.g. DocBook. In this case you can relatively easy transfrom My database doesn't generate DocBook. It generates a custom XML document from which I generate a web page, and a LaTeX document (though soon to be ConTeXt!). There is no reason, technically, why I cannot convert the source XML to either DocBook or directly to EPUB. There are, however, problems doing that, which Aditya correctly surmises: - Automatic section numbering taking care of different conversions. - Automatic index generation and sorting - Inserting hyphenation points at the appropriate place in the generated output (so that the browser can effectively rely on TeX's hyphenation algorithm to do line-breaking). - Convert TeX math to MathML. The current ConTeXT XML source can translate a well formed ConTeXt document into a XML document with the above features. Those are exactly the issues that I would love to resolve using ConTeXt for generating an EPUB. (The MathML isn't as important to me, but I can see other people wanting such a feature.) What about accessibility? I expect that visually impaired people would depend on document structure rather than its visualisation. That is a good point. The current XML structure produced by ConTeXt (Hans correct me here if I'm mistaken) is not accessible, as it doesn't adhere to strict XHTML. I suspect that div tags would not be accessible -- the only way to provide true accessibility in EPUB format would be by using the strict XHTML tags. for instance, we have more levels than H1..H6, so how to do H7? if someone has to deal with that, he/she can as well transform all into H1 with some class which is a local solution then I realize there is not going to be a one-to-one map of all possible ConTeXt macros to XHTML. For someone who has 7 levels of nested sections they would either have to rewrite some Lua or perform some post-processing (e.g., with XSLT). I would posit that a document with 7 levels of nested sections is not going to be a common occurrence. When I talk about strict XHTML, I'm proposing that a _simple_ ConTeXt document (up to 6 header levels, numbered and unnumbered lists, images, text emphasis, etc.) should generate a simple, validating XHTML document. Trying to attain 100% coverage of ConTeXt transmogrification to XHTML is ridiculous when, I suspect, 80% coverage would meet most needs. :-) It is definitely possible to translate the ConTeXt EPUB output to XHTML. However, there are practical realities that hinder such an approach. Architecturally, if anyone is going to translate an XML document to EPUB format, it certainly won't be this way: *XML + XSLT - ConTeXT File - ConTeXt EPUB XML + XSLT - EPUB + CSS* It'll be this way, which is less time-consuming, less complex, and less susceptible to err: *XML + XSLT (or API) - EPUB + CSS* However, it does not, as we all know, produce as feature rich output as leveraging the ConTeXt abilities that Aditya mentioned, which was the point: *XML + XSLT - ConTeXT TeX - EPUB + CSS* Kindest regards. ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___ ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl /
Re: [NTG-context] EPUB XHTML Format
On 9/6/2013 12:00 AM, Thangalin wrote: That is a good point. The current XML structure produced by ConTeXt (Hans correct me here if I'm mistaken) is not accessible, as it doesn't adhere to strict XHTML. I suspect that div tags would not be accessible -- the only way to provide true accessibility in EPUB format would be by using the strict XHTML tags. html is not rich enough .. one ends up with abusing tags which in turn is confusing for accesibility ... i once saw an epub where h1 was used for the chapter number and h2 for the chapter title When I talk about strict XHTML, I'm proposing that a _simple_ ConTeXt document (up to 6 header levels, numbered and unnumbered lists, images, text emphasis, etc.) should generate a simple, validating XHTML document. Trying to attain 100% coverage of ConTeXt transmogrification to XHTML is ridiculous when, I suspect, 80% coverage would meet most needs.. :-) in that case a few page transformation could do, isn't it? *XML + XSLT - ConTeXT TeX - EPUB + CSS* probably ok for novels but who there is no way to limit the user ... so in the end we still have a complex mix to deal with ... i'd rather have ConTeXT TeX reading xml - export - optional transform - EPUB + CSS* you want 'direct epub html from context' (no xslt) but on the other hand use xslt to map onto context while context can do xml directly ... chicken egg Hans - Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl - ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___
Re: [NTG-context] EPUB XHTML Format
Hi, The best reader imho is iBooks on the iPad, nothing else, from what I've seen, comes close. But that is one expensive eReader. :( We'll just have everybody in the world who has a Kindle, Kobo, or other reader exchange their existing hardware, and then purchase an iPad plus iBook. Problem solved? ;-) ConTeXT TeX reading xml - export - optional transform - EPUB + CSS* you want 'direct epub html from context' (no xslt) but on the other hand use xslt to map onto context while context can do xml directly ... chicken egg Well, given that ConTeXt doesn't actually produce validating EPUB documents, I suspect not many people will actually use that feature. It's great in theory, but if it produces books that don't actually work on the Kindle or Kobo, then it's unusable in practice -- never mind not being able to add the books to online marketplaces (such as Amazon) because, again, the output does not validate. Kind regards. ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___
Re: [NTG-context] EPUB XHTML Format
On Fri, 6 Sep 2013, Thangalin wrote: Hi, never mind not being able to add the books to online marketplaces (such as Amazon) because, again, the output does not validate. I think the simplest thing to do would be to update the wiki and have a note that informs readers that while ConTeXt can be used to generate an EPUB, it is likely that that EPUB will be unusable for devices without further transformation of the XML content. At least that way the knowledge is out there and people are forewarned that not all EPUB documents are equivalent. It will also be nice to add a table that lists the EPUB readers (hardware and software) and tells whether ConTeXt produced EPUB documents work on them. Aditya ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___
Re: [NTG-context] EPUB XHTML Format
Hi, never mind not being able to add the books to online marketplaces (such as Amazon) because, again, the output does not validate. I think the simplest thing to do would be to update the wiki and have a note that informs readers that while ConTeXt can be used to generate an EPUB, it is likely that that EPUB will be unusable for devices without further transformation of the XML content. At least that way the knowledge is out there and people are forewarned that not all EPUB documents are equivalent. Kindest regards. ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___
Re: [NTG-context] EPUB XHTML Format
On 9/4/2013 7:55 PM, Thangalin wrote: Hi. of course we could alternatively export all as div class=tag-subtag-... but i don't like that too much; html itself is not rich enough for our purpose What about giving developers the ability to change the destination element? For example: \setuplist[chapter][ xml={\starttag[h1]#1\stoptag} ] Would produce, upon export: h1Chapter/h1 export doesn't happen at that level; something like that would add an ugly overhead; it's way easier to make some xslt script that converts the rather systematic export to something like that and it only has to be written once by someone (not me) Or (using export instead of xml; I don't care what it is named): \setuplist[chapter][ export={\starttag[div]\startattribute[class]{chapter}#1\stopattribute\stoptag}} ] Similarly, this would produce: div class=chapterChapter/div you use some tex syntax but it all happens in lua; also, the only way to provide some kind of different tagging is to support plugins (read: lua functions) that could override default behaviour (but again, it's quite easy to do that as a postprocessing step) This would offer the flexibility of custom XML documents without affecting the default behaviour. * Generates XHTML headers (including !DOCTYPE and html...) not needed as we're 'standalone' Having the ability to produce the !DOCTYPE... and htmnl elements could be as simple as: \setupexport[ standalone=no, ] * Produces images as img tags, rather than float tags. the css can deal with them (info is written to files for that) Yes, but they aren't standard. There is an ecosystem of tools (e.g., Calibre, normalizing CSS templates, etc.), not to mention a widespread knowledge-base, that groks the minimal XHTML specification. Plus, using XML tags that are not in the minimal XHTML spec. means more testing on more devices to make sure that their XHTML parsers render correctly. most of the xml we get here is a funny mix of whatever tags and html (often for tables) and normaly there is way more structure than in the average html document; the export is meant to be close to the source and turning it into some html / div mixture makes it messy for instance, we have more levels than H1..H6, so how to do H7? if someone has to deal with that, he/she can as well transform all into H1 with some class which is a local solution then xhtml has no typical tags .. it's xml + css (or xslt) ... unfortunately browsers have That is, a Strictly Conforming XHTML Document, as per: http://www.w3.org/TR/2000/REC-xhtml1-2126/#docconf the export of context is in fact just xml, and by tagging it as xhtml we can apply css to it; but if someone has a workflow for producing epub an option if to postprocess that xml file into whatever epub one wants indeed. that was the idea: export xml, tag it as xhtml (with the option to provide hyperlinks, an exception), provide some standard css as starter and then let users deal with matters the way they like; you can be pretty sure that what you want is not the same as what someone else wants; and if more people want it, they can together write a transformation script (or hire someone) keep in mind that the export itself is already tricky enough and for me it doesn't pay off to provide tons of additional functionality (well, it doesn't pay of to export anyway) I could transform the ConTeXt-generated XML into strictly conforming XHTML, but it was a step I was hoping to avoid. Right now my process is: 1. Convert XML data to a ConTeXt .tex file. 2. Convert ConTeXt to either PDF or EPUB. 3. Stylize EPUB using CSS. but writing the transform that suits you is just one step (with yuou spending the time on it) while extending the export into a complete transformation and configuration thing would put the burden on me -) I want to use ConTeXt here (instead of going directly from XML data to EPUB) because ConTeXt provides functionality such as multiple indexes, table-of-contents, and bundling the .epub. Having an extra step to generate strictly conforming XHTML is architecturally painful as it means transforming the document three times (XML - ConTeXt, ConTeXt - XML, then XML - XHTML). why is it painful? the export if quite generic and will not change; it is also flexible as it honors user defined sectioning and styling Everytime we look into epub there's another issue ... it's not a standard but reversed engineered application mess (happen soften with xml: turn some application data structures into xml and call it a standard) Some book vendors only accept validating EPUBs. ConTeXt is documented as being able to generate EPUBs. The documentation should state the EPUBs do not validate and do not generate strictly conforming XHTML. well, i, luigi and some others did tests: the thing is that epub is evolving
Re: [NTG-context] EPUB XHTML Format
On Thu, 5 Sep 2013, Hans Hagen wrote: On 9/4/2013 11:20 AM, Hans Hagen wrote: you get a representation in xml indeed, but not verbatim, but as close as possible to the genaric (parent) structure elements in context probably the most straightforward xhtml export is file with only div class=section ... div class=... ... div /div i.e. only divs and spans How easy is it to create a new export format. IIRC, context keeps track of the entire document tree, and flushes the XML output only at the end. Is it possible to make this pluggable so that users can write their own transformers (in lua) on how the document tree can be written. This will enable more output formats (opendocument and (shudder) latex). Aditya ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___
Re: [NTG-context] EPUB XHTML Format
On 9/5/2013 7:57 PM, Khaled Hosny wrote: On Thu, Sep 05, 2013 at 09:57:59AM -0700, Thangalin wrote: Hi, div class=section ... div class=... ... div /div i.e. only divs and spans I think that would be a more robust output format, technically, easier to adapt, and more readily conform to the strict XHTML tag subset. What about accessibility? I expect that visually impaired people would depend on document structure rather than its visualisation. For that purpose I'd make a nice special doc. But the basic export has at least the similar structure as the original. (After all, it's one of the reasons why we *can do* an export. Hans - Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl - ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___
Re: [NTG-context] EPUB XHTML Format
On 9/5/2013 8:20 PM, Aditya Mahajan wrote: The typical ConTeXt document has a lot of structure, and the XML export generates a well structured XML output. That can be directly used in most modern browsers that handle XML+CSS well. However, most (all?) EPUB readers don't. So, the question is asking if instead ConTeXt could generate a XHTML but how hard would it be to make an xslt tranformation from context.export to epub variants (ok, at some point i can look into it but only if there is a robust standard and i have devices to test it on) and indeed the quality of the source is important Hans - Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl - ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___
Re: [NTG-context] EPUB XHTML Format
On Thu, Sep 05, 2013 at 09:57:59AM -0700, Thangalin wrote: Hi, div class=section ... div class=... ... div /div i.e. only divs and spans I think that would be a more robust output format, technically, easier to adapt, and more readily conform to the strict XHTML tag subset. What about accessibility? I expect that visually impaired people would depend on document structure rather than its visualisation. Regards, Khaled ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___
Re: [NTG-context] EPUB XHTML Format
Le 05/09/2013 20:24, Hans Hagen a écrit : On 9/5/2013 8:20 PM, Aditya Mahajan wrote: The typical ConTeXt document has a lot of structure, and the XML export generates a well structured XML output. That can be directly used in most modern browsers that handle XML+CSS well. However, most (all?) EPUB readers don't. So, the question is asking if instead ConTeXt could generate a XHTML but how hard would it be to make an xslt tranformation from context.export to epub variants (ok, at some point i can look into it but only if there is a robust standard and i have devices to test it on) and indeed the quality of the source is important Sounds by far to be the cleanest approach. Cheers, mh Hans - Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl - ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___ ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___
Re: [NTG-context] EPUB XHTML Format
On 9/4/2013 11:20 AM, Hans Hagen wrote: you get a representation in xml indeed, but not verbatim, but as close as possible to the genaric (parent) structure elements in context probably the most straightforward xhtml export is file with only div class=section ... div class=... ... div /div i.e. only divs and spans - Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl - ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___
Re: [NTG-context] EPUB XHTML Format
Hi, div class=section ... div class=... ... div /div i.e. only divs and spans I think that would be a more robust output format, technically, easier to adapt, and more readily conform to the strict XHTML tag subset. The other issue I encountered was this: \startfrontmatter \startstandardmakeup Title page \stopstandardmakeup \startstandardmakeup Copyright \stopstandardmakeup \completecontent \stopfrontmatter This produced *Title pageCopyright* as text without any markup, which makes the EPUB output a bit difficult to parse. I thought the software should output something like: div class=frontmatter div id=standardmakeup1 class=standardmakeupTitle page/div div id=standardmakeup2 class=standardmakeupCopyright/div div class=contents!-- etc... --/div /div This way the title and copyright pages can be styled independently. Kindest regards. ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___
Re: [NTG-context] EPUB XHTML Format
On Thu, 5 Sep 2013, honyk wrote: On 2013-09-04 Thangalin wrote: What needs to happen to take a minimal ConTeXt file (such as the attached) to produce a minimum viable EPUB that: It is always difficult to parse and further process not well structured plain text without advanced semantics. Garbage in, garbage out. The typical ConTeXt document has a lot of structure, and the XML export generates a well structured XML output. That can be directly used in most modern browsers that handle XML+CSS well. However, most (all?) EPUB readers don't. So, the question is asking if instead ConTeXt could generate a XHTML If you need both EPUB and PDF, start with a semantically rich XML vocabulary, e.g. DocBook. In this case you can relatively easy transfrom (XSLT) input data into almost any format. These basic outputs like EPUB or PDF (via XSL-FO) you can get out-of-the-box. The Context output can be generated using dbcontext: http://dblatex.sourceforge.net/ In sum, use XML as your primary source and from it derive everything else. I haven't used XML-only toolchains. Is it possible to handle: - Automatic section numbering taking care of different conversions. - Automatic index generation and sorting - Inserting hyphenation points at the approriate place in the generated ouput (so that the browser can effectively rely on TeX's hyphenation algorithm to do linebreaking). - Convert TeX math to MathML. The current ConTeXT XML source can translate a well formed ConTeXt document into a XML document with the above features. Aditya ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___
Re: [NTG-context] EPUB XHTML Format
On 9/5/2013 7:22 PM, Aditya Mahajan wrote: On Thu, 5 Sep 2013, Hans Hagen wrote: On 9/4/2013 11:20 AM, Hans Hagen wrote: you get a representation in xml indeed, but not verbatim, but as close as possible to the genaric (parent) structure elements in context probably the most straightforward xhtml export is file with only div class=section ... div class=... ... div /div i.e. only divs and spans How easy is it to create a new export format. IIRC, context keeps track of the entire document tree, and flushes the XML output only at the end. Is it possible to make this pluggable so that users can write their own transformers (in lua) on how the document tree can be written. This will enable more output formats (opendocument and (shudder) latex). sure, but first i want to clean up some code (it's rather complex) ... in principle there is a document tree so one can plug into that; alternatively one can load the xml tree and mess with that (probably easier if we provide some styles for it) Hans - Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl - ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___
Re: [NTG-context] EPUB XHTML Format
On 2013-09-04 Thangalin wrote: What needs to happen to take a minimal ConTeXt file (such as the attached) to produce a minimum viable EPUB that: It is always difficult to parse and further process not well structured plain text without advanced semantics. Garbage in, garbage out. If you need both EPUB and PDF, start with a semantically rich XML vocabulary, e.g. DocBook. In this case you can relatively easy transfrom (XSLT) input data into almost any format. These basic outputs like EPUB or PDF (via XSL-FO) you can get out-of-the-box. The Context output can be generated using dbcontext: http://dblatex.sourceforge.net/ In sum, use XML as your primary source and from it derive everything else. Jan ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___
Re: [NTG-context] EPUB XHTML Format
Hi, handle XML+CSS well. However, most (all?) EPUB readers don't. So, the question is asking if instead ConTeXt could generate a XHTML Precisely. If you need both EPUB and PDF, start with a semantically rich XML vocabulary, e.g. DocBook. In this case you can relatively easy transfrom My database doesn't generate DocBook. It generates a custom XML document from which I generate a web page, and a LaTeX document (though soon to be ConTeXt!). There is no reason, technically, why I cannot convert the source XML to either DocBook or directly to EPUB. There are, however, problems doing that, which Aditya correctly surmises: - Automatic section numbering taking care of different conversions. - Automatic index generation and sorting - Inserting hyphenation points at the appropriate place in the generated output (so that the browser can effectively rely on TeX's hyphenation algorithm to do line-breaking). - Convert TeX math to MathML. The current ConTeXT XML source can translate a well formed ConTeXt document into a XML document with the above features. Those are exactly the issues that I would love to resolve using ConTeXt for generating an EPUB. (The MathML isn't as important to me, but I can see other people wanting such a feature.) What about accessibility? I expect that visually impaired people would depend on document structure rather than its visualisation. That is a good point. The current XML structure produced by ConTeXt (Hans correct me here if I'm mistaken) is not accessible, as it doesn't adhere to strict XHTML. I suspect that div tags would not be accessible -- the only way to provide true accessibility in EPUB format would be by using the strict XHTML tags. for instance, we have more levels than H1..H6, so how to do H7? if someone has to deal with that, he/she can as well transform all into H1 with some class which is a local solution then I realize there is not going to be a one-to-one map of all possible ConTeXt macros to XHTML. For someone who has 7 levels of nested sections they would either have to rewrite some Lua or perform some post-processing (e.g., with XSLT). I would posit that a document with 7 levels of nested sections is not going to be a common occurrence. When I talk about strict XHTML, I'm proposing that a _simple_ ConTeXt document (up to 6 header levels, numbered and unnumbered lists, images, text emphasis, etc.) should generate a simple, validating XHTML document. Trying to attain 100% coverage of ConTeXt transmogrification to XHTML is ridiculous when, I suspect, 80% coverage would meet most needs. :-) It is definitely possible to translate the ConTeXt EPUB output to XHTML. However, there are practical realities that hinder such an approach. Architecturally, if anyone is going to translate an XML document to EPUB format, it certainly won't be this way: *XML + XSLT - ConTeXT File - ConTeXt EPUB XML + XSLT - EPUB + CSS* It'll be this way, which is less time-consuming, less complex, and less susceptible to err: *XML + XSLT (or API) - EPUB + CSS* However, it does not, as we all know, produce as feature rich output as leveraging the ConTeXt abilities that Aditya mentioned, which was the point: *XML + XSLT - ConTeXT TeX - EPUB + CSS* Kindest regards. ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___
Re: [NTG-context] EPUB XHTML Format
I'd say use an xml source (docbook, TEI, or DITA) and then write a ConTeXt stylesheet to typeset your XML. See http://wiki.contextgarden.net/TEI_xml I think that TEI-lite is a nice, very general XML vocabulary... Best, Mica On Thu, Sep 5, 2013 at 11:24 AM, Hans Hagen pra...@wxs.nl wrote: On 9/5/2013 8:20 PM, Aditya Mahajan wrote: The typical ConTeXt document has a lot of structure, and the XML export generates a well structured XML output. That can be directly used in most modern browsers that handle XML+CSS well. However, most (all?) EPUB readers don't. So, the question is asking if instead ConTeXt could generate a XHTML but how hard would it be to make an xslt tranformation from context.export to epub variants (ok, at some point i can look into it but only if there is a robust standard and i have devices to test it on) and indeed the quality of the source is important Hans --**--**- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl --**--**- __**__** ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/** listinfo/ntg-context http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/**projects/contextrev/http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net __**__** ___ ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___
Re: [NTG-context] EPUB XHTML Format
On 9/4/2013 3:19 AM, Thangalin wrote: Hi, The attached t.tex file produces the attached t.xhtml file. I have looked at the following documents: * http://en.wikipedia.org/wiki/EPUB#Open_Publication_Structure_2.0..1 http://en.wikipedia.org/wiki/EPUB#Open_Publication_Structure_2.0.1 * http://en.wikipedia.org/wiki/DTBook * http://www.idpf.org/epub/20/spec/OPS_2.0.1_draft.htm * http://www.w3.org/TR/xhtml11/doctype.html * http://www.w3.org/TR/html5/sections.html It seems that the macros in t.tex are being written out as XML elements, verbatim. It is my understanding that these XML elements, however, do not conform to the minimal content models associated with XHTML 1.1. you get a representation in xml indeed, but not verbatim, but as close as possible to the genaric (parent) structure elements in context of course we could alternatively export all as div class=tag-subtag-... but i don't like that too much; html itself is not rich enough for our purpose What needs to happen to take a minimal ConTeXt file (such as the attached) to produce a minimum viable EPUB that: * Generates XHTML headers (including !DOCTYPE and html...) not needed as we're 'standalone' * Produces images as img tags, rather than float tags. the css can deal with them (info is written to files for that) the only real problematic thing is hyperlinks as css has no provision for that so there's an option to inject a... * Uses typical XHTML tags for body elements (e.g., ol for ordered lists). xhtml has no typical tags .. it's xml + css (or xslt) ... unfortunately browsers have messed up html so much (extensions, too tolerant support for unmatched tags, different rendering models) that xhtml never really took off the export of context is in fact just xml, and by tagging it as xhtml we can apply css to it; but if someone has a workflow for producing epub an option if to postprocess that xml file into whatever epub one wants (i.e. the export is generic and carries as much info as possible) Ideally, I would like to do something such as: * context t.tex * mtxrun --script epub --make t.specification to generate an EPUB that passes validation of epubcheck http://code.google.com/p/epubcheck/wiki/Library, with an output XHTML file that more closely matches the XHTML specification. Everytime we look into epub there's another issue ... it's not a standard but reversed engineered application mess (happen soften with xml: turn some application data structures into xml and call it a standard) I only tested (long ago already) with some firefox plugin (i don't have a recent epub device, only an old firts generation one which is dead slow, never relly used, probably broken by now) and i refuse to buy a new one till resolution is decent (and i only want generic devices, not something bound to some shop) How can I help? by testing as i have no real use/demand for epub it's not something i look into on a daily basis Hans - Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl - ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___
Re: [NTG-context] EPUB XHTML Format
Hi. of course we could alternatively export all as div class=tag-subtag-... but i don't like that too much; html itself is not rich enough for our purpose What about giving developers the ability to change the destination element? For example: \setuplist[chapter][ xml={\starttag[h1]#1\stoptag} ] Would produce, upon export: h1Chapter/h1 Or (using export instead of xml; I don't care what it is named): \setuplist[chapter][ export={\starttag[div]\startattribute[class]{chapter}#1\stopattribute\stoptag}} ] Similarly, this would produce: div class=chapterChapter/div This would offer the flexibility of custom XML documents without affecting the default behaviour. * Generates XHTML headers (including !DOCTYPE and html...) not needed as we're 'standalone' Having the ability to produce the !DOCTYPE... and htmnl elements could be as simple as: \setupexport[ standalone=no, ] * Produces images as img tags, rather than float tags. the css can deal with them (info is written to files for that) Yes, but they aren't standard. There is an ecosystem of tools (e.g., Calibre, normalizing CSS templates, etc.), not to mention a widespread knowledge-base, that groks the minimal XHTML specification. Plus, using XML tags that are not in the minimal XHTML spec. means more testing on more devices to make sure that their XHTML parsers render correctly. xhtml has no typical tags .. it's xml + css (or xslt) ... unfortunately browsers have That is, a Strictly Conforming XHTML Document, as per: http://www.w3.org/TR/2000/REC-xhtml1-2126/#docconf the export of context is in fact just xml, and by tagging it as xhtml we can apply css to it; but if someone has a workflow for producing epub an option if to postprocess that xml file into whatever epub one wants I could transform the ConTeXt-generated XML into strictly conforming XHTML, but it was a step I was hoping to avoid. Right now my process is: 1. Convert XML data to a ConTeXt .tex file. 2. Convert ConTeXt to either PDF or EPUB. 3. Stylize EPUB using CSS. I want to use ConTeXt here (instead of going directly from XML data to EPUB) because ConTeXt provides functionality such as multiple indexes, table-of-contents, and bundling the .epub. Having an extra step to generate strictly conforming XHTML is architecturally painful as it means transforming the document three times (XML - ConTeXt, ConTeXt - XML, then XML - XHTML). Everytime we look into epub there's another issue ... it's not a standard but reversed engineered application mess (happen soften with xml: turn some application data structures into xml and call it a standard) Some book vendors only accept validating EPUBs. ConTeXt is documented as being able to generate EPUBs. The documentation should state the EPUBs do not validate and do not generate strictly conforming XHTML. I have spent the last three weeks converting documents from LaTeX to ConTeXt because the documentation stated that ConTeXt can produce EPUBs. While true, the documentation did not mention its shortcomings. Had I known in advance, I probably would have gone straight to EPUB using Java or, with a little revulsion, PHP classes. ;-) That said, I probably should have tested this feature sooner. :-) as i have no real use/demand for epub it's not something i look into on a daily basis How can I help resolve these issues? Merely testing (which I am happy to do) isn't going to produce a strictly conforming XHTML document. Kindest regards. ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___
[NTG-context] EPUB XHTML Format
Hi, The attached t.tex file produces the attached t.xhtml file. I have looked at the following documents: - http://en.wikipedia.org/wiki/EPUB#Open_Publication_Structure_2.0.1 - http://en.wikipedia.org/wiki/DTBook - http://www.idpf.org/epub/20/spec/OPS_2.0.1_draft.htm - http://www.w3.org/TR/xhtml11/doctype.html - http://www.w3.org/TR/html5/sections.html It seems that the macros in t.tex are being written out as XML elements, verbatim. It is my understanding that these XML elements, however, do not conform to the minimal content models associated with XHTML 1.1. What needs to happen to take a minimal ConTeXt file (such as the attached) to produce a minimum viable EPUB that: - Generates XHTML headers (including !DOCTYPE and html...) - Produces images as img tags, rather than float tags. - Uses typical XHTML tags for body elements (e.g., ol for ordered lists). Ideally, I would like to do something such as: - context t.tex - mtxrun --script epub --make t.specification to generate an EPUB that passes validation of epubcheckhttp://code.google.com/p/epubcheck/wiki/Library, with an output XHTML file that more closely matches the XHTML specification. How can I help? Kind regards. t.tex Description: TeX document t.xhtml Description: application/xhtml epub-errors.log Description: Binary data ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___