On 11/05/11 2:33 PM, Rasmus Lerdorf wrote:
On May 10, 2011, at 21:01, Gabriel Sosa<sosagabr...@gmail.com>  wrote:
I'm basically using lynx to convert some html into plain text

basically replicating the following command:

*lynx -pseudo_inlines=off -hiddenlinks=merge -reload -cache=0 -notitle
-force_html -dump -nocolor -stdin*

I've been looking but I didn't find any other library capable to do
the same with "almost" the same quality.

You may be right that it does it better than other mechanisms and it
may be the way to go. But it sounds like you need it to be faster. You
are still not going to gain much simply by calling lynx from C. The
only way to speed this up is to not have to fork and exec a new
process on every request. One way to do that would be to figure out
how to talk to an already running instance of lynx. Then write a
little Gearman wrapper for them and launch a bunch of Gearman workers.
Another benefit of this approach is that you will be able call lynx
asynchronously.

Rasmus is spot on, but another thought is that if your content is often
the same, caching it somehow (either with PHP code or with a PHP
extension--I would just try PHP code for starters) could yield large
speed-ups, too.

Ben.




--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to