Re: [gentoo-user] Trying to automate HTML --- pdf

2008-01-28 Thread Etaoin Shrdlu
On Sunday 27 January 2008, [EMAIL PROTECTED] wrote:

 Oh geez, I LOVE it!  I will play with it, it just might do the trick.
 It's sure not what I had been expecting, but if it works reliably, it
 is just the ticket.

Java applets and flash animations could possibly cause problems, since 
they might need a few seconds to initialize even after the page is fully 
loaded (and thus the stop button is already inactive). Of course, if 
the pages you load don't use java/flash this is not a problem; but there 
might be other pitfalls. For example, I've noticed that konqueror loads 
some complex pages in two or more stages, with a brief pause (and 
the stop button inactive) between one stage and the next.

You can check that by running something like

while true; do 
 dcop konqueror-8364 konqueror-mainwindow#1 actionIsEnabled stop; 
done

ie, continuously checking the status of the stop button, and you'll see 
something like

...
true
true
true
true
true
true
false
false
false
false
false
false
false
false
false
false
true
true
true
true
true
true
true
...
true
false

before the page is fully loaded and the status eventually settles 
to false. So, if the script runs the test during the short false 
interval, it might be fooled into thinking that the page has loaded. I 
have not investigated further the cause of this behavior (perhaps 
multiple-frame pages?), but these few facts alone should be enough to 
deserve extra attention and thorough testing before using the kludge.

 Sheesh.  A bloomin' genius is what you are :-)

Thanks, glad you have at least a slightly better solution than before!
-- 
gentoo-user@lists.gentoo.org mailing list



Re: [gentoo-user] Trying to automate HTML --- pdf

2008-01-28 Thread felix
On Mon, Jan 28, 2008 at 04:01:12PM +0100, Etaoin Shrdlu wrote:

 You can check that by running something like
 
 while true; do 
  dcop konqueror-8364 konqueror-mainwindow#1 actionIsEnabled stop; 
 done

That will bear protecting against.  I have the basic program working,
but it does need some fine tuning, and I will make it insist on having
the status stay good for a couple of seconds after.  One of the screwy
things is that the widget names change as URLs are loaded.  I mainly
add a lot of checks and put out an alert for manual intervention when
something off happens.

So far none of the web sites use flash or Java applets.  That would be
a real mess for printing alone.

-- 
... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._.
 Felix Finch: scarecrow repairman  rocket surgeon / [EMAIL PROTECTED]
  GPG = E987 4493 C860 246C 3B1E  6477 7838 76E9 182E 8151 ITAR license #4933
I've found a solution to Fermat's Last Theorem but I see I've run out of room o
-- 
gentoo-user@lists.gentoo.org mailing list



[gentoo-user] Trying to automate HTML --- pdf

2008-01-27 Thread felix
I am trying to automate converting a URL into a pdf file.  These web
pages include javascript and fancy formatting, so the simple minded
converters just don't cut the ice.  My next plan was to hack up a real
browser so it would take two command line args, the URL and the print
file, render the page, print it to the pdf file, and exit.  From what
I know of some of them, they would have to be configured in advance,
and invocation would have to be strictly controlled so only one
instance runs at a time, at least per user.  I could probably create
several firefox user sessions and have each of them running
simultaneously, but multiple real users works for me too.

Firefox doesn't print to pdf, however.  But konqueror does.  By using
the DCOP interface, I can even pass it commands to load a URL and
print the page, altho I have to settle for the configured print file
name.  But since I have to run individual sessions anyway, that's no
big deal.  The commands look like this:

dcop konqueror-6352 'konqueror-mainwindow#1' openURL 'http://slashdot.org'
dcop konqueror-6352 html-widget2 print true

There's a bit more than that, since widget names change, but a simple
perl program handles it easily (so far!).

However, there's a problem.  The openURL command returns without
waiting for the web page to finish loading, and the print command
does not wait for it to finish loading.  The print command does wait
for printing to finish before returning, which is nice.

This means I have to put in some arbitrary sleep 30 or so between
openURL and print to have a good chance of a complete printed
page, and even then, there is no guarantee it actually will be
complete.  We have to send these pdf files to a bank, and it would not
be good to send them incomplete pages, even if only one out of 100 or
even 1000.  There will be at least hundreds of these every day.

I started to look at sources but there is no konqueror-3.5.8.tar.gz
or anything similar.  No doubt most of the code is handled by Qt
widgets and KDE libs.

Here are my quests:

0.  Is there a better place to ask this?  I tried a KDE mailing list
and got no responses; there weren't even many views.

1.  Is there either a DCOP command to wait for a URL to be loaded or a
DCOP command like openURL which waits?

2.  Is there a source file for konqueror which I could hack to take
command line parameters without changing libraries or other code
which would affect the rest of KDE?  I don't have any problem with
a hacked and renamed konqueror command.

3.  Is there some other way of converting complicated web pages into
pdf?  If they don't understand javascript and style sheets and
everything else that a real browser does, they are useless to me.

4.  Are there other ways to do this that I haven't thought of?

-- 
... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._.
 Felix Finch: scarecrow repairman  rocket surgeon / [EMAIL PROTECTED]
  GPG = E987 4493 C860 246C 3B1E  6477 7838 76E9 182E 8151 ITAR license #4933
I've found a solution to Fermat's Last Theorem but I see I've run out of room o
-- 
gentoo-user@lists.gentoo.org mailing list



Re: [gentoo-user] Trying to automate HTML --- pdf

2008-01-27 Thread Neil Bothwick
On Sun, 27 Jan 2008 09:06:15 -0800, [EMAIL PROTECTED] wrote:

 1.  Is there either a DCOP command to wait for a URL to be loaded or a
 DCOP command like openURL which waits?

I can't see one, but it sounds like it would be useful enough to file a
bug report requesting one. A DCOP command to tell whether the page has
finished lading would be suitable.

 2.  Is there a source file for konqueror which I could hack to take
 command line parameters without changing libraries or other code
 which would affect the rest of KDE?  I don't have any problem with
 a hacked and renamed konqueror command.

Konqueror is part of kdebase, so you'll find the source somewhere in
there.


-- 
Neil Bothwick

Where the system is concerned, you're not allowed to ask `Why?'


signature.asc
Description: PGP signature


Re: [gentoo-user] Trying to automate HTML --- pdf

2008-01-27 Thread Etaoin Shrdlu
On Sunday 27 January 2008, [EMAIL PROTECTED] wrote:

  dcop konqueror-6352 'konqueror-mainwindow#1' openURL 
 'http://slashdot.org' 
  dcop konqueror-6352 html-widget2 print true 

 There's a bit more than that, since widget names change, but a simple
 perl program handles it easily (so far!).

 However, there's a problem.  The openURL command returns without
 waiting for the web page to finish loading, and the print command
 does not wait for it to finish loading.  The print command does wait
 for printing to finish before returning, which is nice.
[cut]
 1.  Is there either a DCOP command to wait for a URL to be loaded or a
 DCOP command like openURL which waits?

I know of no direct method, and I can't answer your other questions 
either.
However, the following (admittedly *really* kludgy and quick-and-dirty) 
method *seems* to work:

dcop konqueror-6352 'konqueror-mainwindow#1' openURL 'http://my.url'

while true; do

  # check if the stop button is clickable
  stat=`dcop konqueror-6352 konqueror-mainwindow#1 actionIsEnabled stop`

  if [ $stat == true ]; then
# stop button is active, so page is still loading
sleep 5
  else
# stop button is not active, page has loaded
break
  fi
done

# do what you want here

As I said above, I did some tests and this seems to work. However, I'm 
not claiming that it's the solution to your problem, nor that it will 
always work as expected. Therefore, I strongly suggest you test it 
thoroughly before using it.

Hope that helped.
-- 
gentoo-user@lists.gentoo.org mailing list



Re: [gentoo-user] Trying to automate HTML --- pdf

2008-01-27 Thread felix
On Sun, Jan 27, 2008 at 06:56:59PM +0100, Etaoin Shrdlu wrote:

 However, the following (admittedly *really* kludgy and quick-and-dirty) 
 method *seems* to work:

Oh geez, I LOVE it!  I will play with it, it just might do the trick.
It's sure not what I had been expecting, but if it works reliably, it
is just the ticket.

Sheesh.  A bloomin' genius is what you are :-)

-- 
... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._.
 Felix Finch: scarecrow repairman  rocket surgeon / [EMAIL PROTECTED]
  GPG = E987 4493 C860 246C 3B1E  6477 7838 76E9 182E 8151 ITAR license #4933
I've found a solution to Fermat's Last Theorem but I see I've run out of room o
-- 
gentoo-user@lists.gentoo.org mailing list