David, maybe you could update the web site to reflect
the fact that 1.1 is out:)
And we should send out an announcement to plucker-announce, and it
should be posted to PalmGear.
Bill
Seems to me that I should also support 16-bit color in the image
processing programs, now that the 4.0 ROMs are out and I can test it.
Bill
I would suggest adding a warning to the parser to that effect in the
meantime; it's a nasty surprise to find no images when it looked like it
*should* have worked, ie "convert" command run, manifest of written parts
output, PDB generated, etc...
I think this is a good idea. We should figure
According to my calculations the Plucker databases are
(in average) 10% smaller than iSilo databases.
Yes, I've measured around the same difference.
Bill
Note, several Red Hat 7.0 users have reported that they do not have the
problem, even when using the same python version I originally reported
(if I remember correctly). Bill Janssen was one, I think.
Yes, that's right, if we're still talking about the same problem.
Bill
To summarize: If the document is taggged with a title, use it
(perhaps a new option --use-doc-title); if not do something
reasonable like use the db-name or the url. I would much prefer if
plucker displayed the document title (if available, or (none) if
one is not available) in the
I've been waiting for the 4.0 docs to show up before doing the 16-bit code
for netpbm and PIL. But I believe that Windows uses ImageMagick, so perhaps
someone should do the Palm image format integration for ImageMagick.
I've got the netpbm almost done, and the PIL should be very easy, so I'll
I've got the 16-bit image conversion working in PIL, now. I'm working
on the netpbm code. Will post when done.
Bill
David,
Thanks for creating the documents and putting them where others can find them!
I'd suggest submitting them to Memoware, though, instead of Palmgear.
They are set up for lots of new documents, and have a Plucker channel.
To list the various Plucker-format documents there, use
I'm not sure what Chris and Bill use, but I assume Linux as well.
I'm a Solaris weenie.
Bill
I've checked in changes to the ImageParser.py module and
PalmImagePlugin.py that support the 16-bit color standard. Slow as
heck, but it works. I'm working on the netpbm version now.
Bill
Thanks, Chris. I'll add that check in.
Bill
Anyone have the spec for the new PalmOS 4.0 image compression
algorithm called PackBits?
Bill
I've updated the parser code for NewNetPBMImageParser to handle 16-bit
direct color (--bpp=16). Remember that to use NewNetPBMImageParser,
in your ~/.pluckerrc file you set image_parser to netpbm2.
To use it, you will need the latest version of my netpbm subpackage
for Palm image formats, which
Not many other uses for
forms make sense for offline web viewing. Other ideas are welcome...
Perhaps local forms...
I've been thinking about the types of electronic books I keep on my
Palm. There seem to be three major categories:
1) Text and image documents, like Plucker's natural
a full tar file of that directory,
instead, if that would be of more assistance. Please let me know.
Bill
-
Bill Janssen [EMAIL PROTECTED] (650) 812-4763 FAX: (650) 812-4777
Xerox Palo Alto Research Center, Coyote Hill Rd, Palo Alto, CA 94304
*** 1.1 2001/06/07 03:25:10
Are these changes getting pushed upstream to Bryan?
Yes. They'll be in the next netpbm release.
Bill
Any ideas? Can Python even spawn threads?
Python threading works pretty well, and is very portable.
Bill
Before anyone starts implementing, let me point out that I was
kidding! I see no good reason in the parser to introduce the
complexity of having a system written in multiple languages.
Bill
if we could come up with a good architecture that can be implemented in
multiple languages -- or in
Bill, you can pull the image-writing code from what I've done for netpbm.
Bill
How about calling it the Document Library (instead of Database
Manager)?
Bill
On Fri, Jul 13, 2001, Bill Janssen wrote:
How about calling it the Document Library (instead of Database
Manager)?
Sure, but we should probably shorten it to just Library in the menu.
/Mike
That sounds good to me.
Bill
A single pixel
scroll is a bit slow for reading right now on the hardware (about same speed
as pressing down on scrollbar), but a double pixel scroll, (plus an
overclocker like Afterburner) works quite nicely. Or can think about adding
in caching for autoscrolling, though the next generation
Any recent version of tar can (should) handle uncompressing bz2
files natively now. There was an argument switch a few revs back which
changed the -I to -j or --bzip2 on uncompress. I believe on your Solaris 2.6
system, you can use this to uncompress a bzip2 file compressed as such,
Some people prefer the ragged right edge on their text, while
others (like myself) prefer the nice sharp vertical right edge that
justified text provides. Adding this into the viewer (for runtime changes
from justified to ragged) will be a bit more code, and my brain can't quite
Yep, it seems like the parser will replace the nbsp; with a different
kind of spaces (0xA0 instead of 0x20). The viewer will not recognize
them as spaces so they will not be removed.
Yes, 0xA0 is the Unicode character code for non-breaking space. Since
nbsp is not one of the characters
I'd like this to work, too, so I'll take a look at the parser code on
this. Can you have a paragraph which doesn't look like a paragraph
when viewed? I don't think we want to break a line in mid-sentence
just because there's an anchor there.
Bill
16-bit works (currently) with PIL. Not sure if Netpbm2 supports this
as a native format yet. Bill?
The parser code does. Not sure the 16-bit updates for pnmtopalm are
in yet. I'll check on that.
Bill
OK, after visiting with Mike last week, I've gotten fired up to do
some parser work. I'm going to run through the current bugs list, and
see what's on there. Anything simple to fix I'll fix. Then I'm going
to start doing some profiling and tighten the whole thing up. Then I
plan to add
I've fixed bug 5 (bad parsing of identify from later versions of
ImageMagick), but it should be tested on Windows and perhaps by
someone who cares about ImageMagick. I've checked in the fix, so
please test from the CVS.
Bill
BTW: The new WindowsImageParser will use PIL and Bmp2Tbmp.
If we use PIL, do we need Bmp2Tbmp? PIL will write Palm format
directly.
Bill
[not surprising, IMHO]
Subject: RE: PDB access performance
From: Peter Epstein [EMAIL PROTECTED]
Date: Wed, 15 Aug 2001 19:08:15 PDT
To: Palm Developer Forum [EMAIL PROTECTED]
In a nutshell: Finding records by unique ID is slow (linear). Finding
records by index is fast (almost constant time).
Lowell, if you can send me a full trace of the plucker-build command
and crash on your system, I'll try to track this down. Try specifying
-V 2 on the command line, so that we get good debugging info.
Bill
I use the following
Good one! OK, thanks.
(I don't include the version number since it's
not used in a Plucker document),
We should. One of the worst mistakes to make in protocol and format
design is to leave out or ignore the version identifier.
Bill
OK, thanks. I'm going to add to it to describe the Palm headers as well.
Bill
On Tue, Sep 11, 2001, Bill Janssen wrote:
Why is it expressed as a TeX file when it's really HTML inside?
Because I wrote this description as a HTML document and later on it
was also included in the User's
According to PluckerDB.tex, the version number indicates which
compression format is being used. I think we should use the regular
PalmOS version to indicate more basic differences in structure. For
instance, whether we want to have multiple index records, or some
such. This seems to work OK,
I would like to propose the following error handling paradigm: set up a
timer before getting a document. When the timer expires or a socket error or
similiar happens, just ignore *THIS* document and continue with the
This is a good idea in any case. I'll put it in as an option.
Bill
The Winsock send in this case this error but that error is a warning
and could be ignored, but Python raise an exeption.
We could catch the exception and re-try, instead, I suppose.
Bill
Why is the plucker viewer not displaying empty lines for example in
---
dummy linebr
br
new line
---
Note the text in section 9.1 of the HTML 4.01 specification:
...authors should not rely on user agents to render white space
immediately after a start tag or immediately before an
I noticed that the current netpbm HISTORY file didn't report the
upgrade of netpbm to support 16-bit PalmOS color, so I asked about it.
The patches have been incorporated since last June; the HISTORY file
just omitted to say so, and will say so in the next release.
Bill
Thanks, that helped.
Bill
Thanks for the review, Dirk.
- You may write that some values are not zero if you try to read a
Plucker DB. For example you say about unused1, unused2, sortInfoId
that this _must_ be zero, but if you install this BD and copy this
back to the PC its may other valued inside this fields.
I
The first question I'm going to get when I post the format pointer to
ietf-types for review will be about character sets. In particular, I
specify that the docName is an ISO Latin-1 string, along with various
other string values. Is this right?
What about the character set in text data? Is it
I don't understand the def of the attributes field. It says here that
it takes 2 bytes, and that The first 5 bits in the attributes are
unused, the 3 LSB indicates the amount of extra paragraph spacing
(2*value pixels). What about the other 8 bits?
Bill
But IMHO if the Reader of the PluckerDB find a SortInfo Block he
should ignore them and see it not as an wrong DB.
I don't know. If we are going to use it at all, we should define it.
On the other hand, if we did ignore it, it would be a handy place to
put expansion information.
The same as
You can change it to 13 bits. I will change the struct to the following,
typedef struct {
Int16 size; /* Size of text */
Int16 attributes; /* Paragraph info (see above) */
} Paragraph;
So the high-order 3 bits of attributes will be unused, and the
low-order 13
Note that when we translate a table, we put in the thick horizontal
rules between rows, I believe it is, and thin horizontal lines between
the columns of the row.
So that site's not too surprising...
Bill
which is
UInt32 encoding;
if (FtrGet(sysFtrCreator, sysFtrNumEncoding, encoding) != 0)
{
... do something with the info...
}
Just to follow up, the list of possible encodings is (or rather, was
for PalmOS 3.5):
charEncodingAscii, // ISO 646-1991
You do not have to enter RFC 2368 compliant text in the strings that will
be the header, do you?
Good question. Your answer might depend on how much you trust the
various e-mail translating programs that handle the headers. For
example, I use pilot-mail (at least till pilot-xfer gets this
Not sure I understand your implementation here. Do you mean that
everyone who downloads this and builds the documentation, or links from the
documentation, does a checkout of the DBFormat.html file from the cvs,
runtime? Or that they are simply pointed to, using the OBJECT reference, a
That part works OK for me, too, but why is Profiling.py added to
the package when it can only be used if you have access to a
proprietary Xerox PARC module? That makes no sense to me...
Actually, it can be used whether or not you have code_timer.py, but it
only does something interesting if
Maybe OBJECT support should be added to the parser before
taking advantage of this HTML4.0 feature?
Well, perhaps not before, but I agree it should be added. I was
thinking of going right to XHTML, by adding an XML parser.
What user agents support this tag? I'm using Netscape 6.1 and it
Perhaps Mike has an opinion?
#1
I could implement either for the parser fairly easily.
I'm not going to implement support for this in the viewer, though. I
have enough on my plate as it is...
Sure. I could do the viewer part, too, I think, since it's only a
simple check possibly
AFAIK only Western Europe or Japanesse. But what return all the other
devices that currently use Plucker with Chineese or Greek? :-)
I'd like to know that myself.
Bill
Dirk writes:
Bill In any case, we need some standard more prescriptive than string to
Bill describe what's in those URL strings, for instance. The standard
Well this String are Strings :-) A List of Bytes terminated by a
Zero Char. The Mapping from the Byte value to an Picture of the
http://validator.w3.org likes it now. You can see the results,
directly validated from the cvs here:
Thanks for doing this, David.
Bill
Or for simple characters like h2o and so on? What about (r) and (tm)
and (c) stuff now?
(r) and (c) are already in, since Latin-1 includes them as character
glyphs (0xAE and 0xA9, respectively). As for (tm), the Unicode code
charts seem to say that (r) means Registered trade mark sign.
(r) and (c) are already in, since Latin-1 includes them as character
glyphs (0xAE and 0xA9, respectively). As for (tm), the Unicode code
charts seem to say that (r) means Registered trade mark sign.
I guess my point was... are these now going to be superscripted?
Sorry, I
Sounds very reasonable. Just want to design in a way that the same document
is portable among different devices platforms like a Linux PDA, etc. For the
Yep, that's my goal, too.
Yes, we could design a custom font for the Palm viewer that includes a
number of characters. And that could
Sounds very reasonable. Just want to design in a way that the same
document is portable among different devices platforms like a Linux PDA,
etc.
*cough* XML... it was made for exactly that (well, that and to get
rid of the complexity of the non-standard, oddly used SGML tagsets)
Not concerning Plucker, but for another Palm project I am working on, is
there a reason why not to use a glyph on those character slots (as these are
the ones that Palm ROM symbol fonts use, and seems to be okay)?
Not that I can think of. They should never be displayed in normal
text use,
Or take a look at Incs/Core/System/CharLatin.h:
#define chrLeftSingleQuotationMark 0x0091
#define chrRightSingleQuotationMark 0x0092
#define chrLeftDoubleQuotationMark 0x0093
#define chrRightDoubleQuotationMark 0x0094
Bill
Just a idea: What's if the parser replace the euro symbol with the
String EUR, so we get the info on all devices?
I rather like Robert's original idea of adding a function code which
would let us insert a unicode character code in the text. Then when
the viewer saw the character it could
Macintosh will still be a problem, since there is no shell.
Perhaps there is under OS X.
Bill
Remember, implementing an XML parser is no trivial matter. If the
XML page or application fails validation, the page is bitbucketed. In the
current scheme, Plucker tries to make sense of what's left of the broken
HTML, but with XML, that's not allowed.
Luckily, Python 2 comes with
Cool!
Bill
Jorge,
How would you like the timeout to be specified? A
command-line/config-file parameter?
Bill
David,
I was browsing through the Web site today, and was really impressed by
the screen shots you've got up on the Downloads page.
I was thinking that it would be also be impressive to move some of
them, say 4, to the main page, and put them in a row just above the
first News item. Give
David The header passed to the server is, in fact, HTTP_REFRERER. Here's
No that the name of the environment variable passed to the CGI
David an example of the entire environment requested in a GET from a local page on
David http://localhost/index.pl on my laptop.
Ahhhgggh, why
When I am Building the PluckerDB on the command line as soon as it sees any
images
it say
Could not load PIL library
I suspect that you have not installed the PIL properly. To test this,
just start python on the command line, and type import Image. If
PIL is properly installed, this should
Would it
create a list of fetches needed, get promises, get the pages, parse them,
and then do the loop over again? So like 1 on the first pass, then maybe
6, then maybe 25 (this is sort of how my pages are set up). This seems
like it may need a good deal of changes in the parser.
Yep,
Maybe we should call 'em 'Chickens', since you Pluck a chicken before eating
it. Hahahaa... no.
How about Feathers? That's what's plucked, isn't it?
Bill
I was wondering, where exactly is the usual bottleneck in the Parser in
terms of parsing speed?
I've been profiling it, and there's no real hot spot. The main use of
wallclock is for (1) fetching of pages, and (2) parsing of HTML. If
pages could be fetched in parallel, that would probably
However, I think the only major change would be to Spider.py, which
badly needs to be re-written in any case (which I'm doing right now,
by the way).
Just to amplify: I'm doing some general cleanup (like replacing
obsolete constructs like string.atoi() with int), and cleaning up
the
You have to build several arrays
which construct silos to fetch and parse from. You should have a seen
dead or down, fetch, duplicate and completed arrays (or however
Python keeps it straight).
Yes, that sounds like a good design.
Bill
I'm looking at the support for alternate sizes of images, and
wondering if something should change.
To refresh everyone's memory: There are two kinds of image pages
included in a Plucked document, inline and separate. Inline occur in
a text page; separate occur as separate pages, either the
I don't think we should stall our development just because python 2
isn't available on all systems. It's not like the old parser will
disappear (and neither is 2.x bleeding edge:)
2.0 is available in debian-unstable, but for some reason isn't in
debian-testing. 2.0 seems to be the default
I suggest a small block of new options:
--font=std,bold,large,largebold,narrow
If we're moving towards storing the actual font value in the
database itself, why not put a pointer there to specify it upon creation?
I don't think I understand this suggestion, David. The
There is an excuses or reasons file or similar which you can check to
find out.
MJ, can you please provide a URL for that file, so we could check on
the reason? But my point was that Python 2.x for Debian is available.
Bill
Personally, I compile those critical system things myself from
source, not packages. That includes perl, apache, python, gcc, yadda yadda.
Yeah, me too.
Nothing we have in there really is such
a new whizbang feature as to require the 2.x series (yet). We should
continue to support a
python2.2 seems to have
some problem on mips, so is also not included.
2.2 was still in beta last week, so I'm not terribly surprised.
Bill
Peter,
You're just tripping over invalid HTML. Someone's got a HR
SIZE=.5 in there, which is an invalid size specification (see
http://www.w3.org/TR/html4/types.html#type-pixels -- has to be an
integer). Nothing to do with images.
We could error-check these values more carefully in the
I realized I didn't know how the URLs in the URL data records are related to the
information in the pages. Could someone who understands it (Chris, Mike?) please
send or post an amplified description? Thanks.
Bill
I'm far from expert of this kind of stuffs, does somebody know
about this? Or are we happy if I would add a new option in parser like
'-sjis' which always converts characters to Shift JIS code.
Hi, Nori.
I do know something about this. I think we should in general add an
option to output
Mike, Chris,
I'm trying to figure out how a URL for an uncollected link is
attached to that link, so that the viewer knows to display it when
the link is tapped. The URLs, presumably, are kept in the URL record,
but what is stored in that link to find the URL?
Bill
So here's what I'm planning:
Read all the stdin till we hit end-of-file, treat it as whatever type
is specified on the command line, and process it as the home document.
Probably need two new command-line switches, --stdin-type=foo and
--stdin-url=foo, to allow you to specify the stdin
Nori,
Is this for text documents, or do you also process S-JIS HTML? Does
the parser work OK on S-JIS HTML?
Bill
Hi Bill,
At Wed, 31 Oct 2001 15:59:29 PST,
Bill Janssen wrote:
There's a question here of what the output character set should be for
a Plucker doc, something we really
I see that the current way a missing tag (like a URL with
http://foo.bar.com/bletch.html#tag
but there's no tag in bletch.html) is handled is to point the link
to the beginning of the page which presumably would have contained the
link (in the above example, to paragraph 0 of bletch.html).
The MSN change has affected Slate.com, an online magazine owned by MS.
The re-styling is so bad that I figured I'd start plucking it instead
of looking at it in a browser. Unfortunately, it's in UTF-8 and
XHTML, and contains a number of the standard odd characters. I
wrote a little csh/Python
It smells like a missing Content-type header, and I thought there
was a fix in cvs for this. I recall someone mentioning this before, but I
can't find the reference in my archives.
But it seems to have a content-type:
$ HEAD http://www.netlet.net/interact/Babe.jpg
406 No acceptable
Ah, yes, I see what you mean. Clueless of me.
Bill
Just a note to all parser hackers: string.atoi() has been deprecated,
and should be avoided. Use int() instead -- it works on strings and
numbers.
Bill
Please feel free to modify/improve as you see fit, as I am
not much of a python wizard. If you pass everything around as a 3-ple, you
can chop out the itoa function from PluckerDocs.py.
Another thought I had was to use function code 0x1B instead of 0x53,
which saves a tiny bit in the function
A scrollbar/jump location relative to the number of paragraphs could
be a problem. In a document that is one huge paragraph the scrollbar
would not be of any help to the user. Neither will the scrollbar/jump
location be updated while you scroll *within* a paragraph. Could be
confusing.
The exact reason is that plucker recognizes gzip as an encoding, while
it does not recognize x-gzip which is also used. In the meantime, I will
change my encoding to be gzip.
gzip is a registered HTTP content-coding transfer parameter (see
http://www.iana.org/assignments/http-parameters)
Looks like a missing/bad Accept: header. I didn't think it was required
by the HTTP RFC, but Microsoft has never been one to adhere to standards.
It's probably correct to send an Accept: header that looks something like:
Accept: text/*, image/*
Yes, this looks like a good idea.
Well, I can always help a bit with PHP. Hate to admit it :-).
Bill
I've checked in a number of changes to the parser, mainly cleanup.
I'm only half-done with this, but it's at an internally consistent
state, and I thought I'd check it in so that people can look for
problems I haven't spotted.
Dirk, you'll want to look at what I've done in ImageParser.py.
I've
- Do you wan't also the way to let the user specify the max_tbmp_size?
config.get_int ('max_tbmp_size', SimpleImageMaxSize)
I suppose, but it strikes me as an internal detail that the user
generally shouldn't care about.
- The lines
while len(newbits) SimpleImageMaxSize:
Nicholas,
It should be possible to install on a Mac. First you need to install
Python (http://www.cwi.nl/~jack/macpython.html), and make sure to also
install PIL as part of that (it's an optional part of the
installation).
Then install the Python files in a folder somewhere. Invoke
1 - 100 of 496 matches
Mail list logo