With regard to the rewriter. 

In the simplest form you could have a several services that take a 
Stream<Element> and return a Stream<Element> so that it could build up a full 
process stream which is then used to parse the html.

The easiest implementation of that would take on a similar structure to the 
existing rewriter, without having the generator and processor as you mentioned.

However, what would probably be more beneficial is to have the processing of 
this html being done in asynchronous manner so that a a large document could be 
parsed, processed, and pushed without maintaining state. The state in this case 
being the full document.

There's a couple of ways that could be handled, but I'm still exploring. I'm 
diving into async contexts to see if that would be helpful. Which might lead to 
implementing Asynchorous Servlet support in a more structured format. That 
might be really useful on a large scale.


- Jason

On Thu, Oct 25, 2018, at 2:33 PM, Daniel Klco wrote:
> Jason,
> 
> This sounds like a great tool to create a new Rewriter. Would you see
> having OSGi Components as a subtype of Consumer being registered to provide
> the Transformers? Is there any reason to have a separate Generator and
> Processor?
> 
> Thanks,
> Dan
> 
> On Thu, Oct 25, 2018 at 12:57 PM Jason E Bailey <[email protected]> wrote:
> 
> > Yeah, I'm really bad for naming bundles.
> >
> > The new bundle currently provides a new "html5-generator" that will work
> > with the existing rewriter.
> >
> > How it works is that it uses the same rules that web browsers do to
> > determine when a tag in a document is one that needs to be handled or if
> > it's part of a text area. It then creates an Element object for that given
> > section and passes it along when requested. This is a pull based parser
> > with no structural validation. It won't re-write your html unless you
> > specifically request it to.
> >
> > An example generic usage:
> > Tag.stream(inputStream, "UTF-8").filter(elem -> elem.getType() ==
> > ElementType.START_TAG).count();
> >
> > or a more complex one:
> >
> > stream.map(element -> {
> >         if (element.containsAttribute("href")) {
> >             String value = element.getAttributeValue("href");
> >             if (value != null && value.startsWith("/")) {
> >                 element.setAttribute("href", "http://www.apache.org"; +
> > value);
> >             }
> >         }
> >         if (element.containsAttribute("src")) {
> >             String value = element.getAttributeValue("src");
> >             if (value != null && value.startsWith("/")) {
> >                 element.setAttribute("src", "http://www.apache.org"; +
> > value);
> >             }
> >         }
> >         return element;
> >  }).map(HtmlStreams.TO_HTML).forEach(System.out::print);
> >
> > Which would parse all of your html, find hrefs and src attributes that are
> > relational and rewrite them as full paths, then convert the individual
> > nodes back to HTML.
> >
> > - Jason
> >
> >

Reply via email to