At 3:49 pm +0200 27/05/02, Bart Lateur wrote: >Sorry for the late reply. Actually, no, I'm not sorry, I've been away >for a few weeks, so it's actually not my fault. :-)
Bart's note has finally turned up here. > >Problem 2: >>index() will also find blocks which look like the right one >>but are really the wrong objects ("14 0 obj", "4 0 obj"). > >Then use a regex. No need to use pos() any more to find out where the >match starts, $-[0] can tell you it is. $info_block should contain a >number, shouldn't it? > > my $info_start = $str =~ /\b$info_block 0 obj\b/ && $-[0]; > >I think this will even be plug-in compatible with your original >solution. In fact you do not know where the 'info_block' is -- it can be almost anywhere in the file. The 'classical' method is to look up the string-position in the table. However this becomes quite a convoluted process if the PDF file has been 'linearised'. The first version of the script used direct look up and that was abandoned because of the complications with linearised files. The second try used a regex. There are three difficulties with that. In the first place you have to use $` to get the starting point of the 'info-block', which is not very nice. Secondly you don't know where the end of the 'info_block' is, so you have to make a guess as to how much to fetch in order to be sure you have included the line beginning '/Title:'. The third difficulty is avoiding a false match (of the kind Axel mentions). In the ordinary run of events one would match to /^14 0 obj/. But here you have to be careful because the PDF file can have any of the three line-endings. I think a regex of the kind /\012|\01514 0 obj/ would probably be water-tight, but I haven't tried it. Most of the problems disappear using index() to find the positions of the start and end of the 'info_block'. Admittedly having found a candidate you have to look back to make sure the preceding character is a line-break of some sort, which involves a loop of the kind Axel suggested. However this is a very economical solution since nine times out of ten the first candidate will be the right one. Even if the first candidate fails, the second candidate is almost bound to succeed. The only time lost, so to speak, is that taken to look backwards, which is pretty negligible. At 4:09 pm -0400 27/05/02, Chris Nandor wrote: >Sorry! Space at the end of the filename. ":file.txt " <-- space here. Grrr. No, my fault. I looked at that for an age and missed it. Gray cell deficit rather than eyesight I think. At 4:08 pm -0400 27/05/02, Ronald J Kimball wrote: > > >The greater danger with C< open F, $f > is that the filename might begin >> >with a ">" or somesuch. Both three-arg open, and the method above with >> >"\0", solve both problems; but the latter method works in any version of >> >perl. I am not a big fan of three-arg open, but I have to admit it looks a >> >lot nicer. :-) > >Note that this solution does not work with leading spaces, however. You >have to use sysopen or the new three-arg open to handle those. So I suppose for the time being, until all the world has updated to 5.6.x, one should use 'sysopen()' as a matter of course? Alan Fry