Wow!

Thank you, guys, for the great ideas so far.

To somewhat simplify the task, I indent to use the
local area threshold algorithm. So very same image
now becomes this:
http://picasaweb.google.com/capsunel/Imaging/photo#5074975837667856626

The command was very simple:
convert book_page.jpg -lat 5x5-5% -monochrome book_page_lat.jpg

Now, I'm guessing, it's easier to approach it, for example like
Marcel proposed.

Another approach (which I'll definitely try) is to work like the
unpaper tool (thanks, ignotus, for the idea):
http://unpaper.berlios.de/

The tool does some "magick" on images (scanned pages) to make
them better readable by both humans and OCR engines. There is
just one source file, and it is easy to follow.
=)

Still, if somebody has other ideas, I'm curious to hear them.

Alex







Wouterse, Marcel wrote:
Hi,

First methodology which pops into my mind:

Make the white part transparent (with a fuzz factor) and then do a
repage or something...

Regards,
Marcel

-----Oorspronkelijk bericht-----
Van: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Namens Alexandru Ciobanu
Verzonden: maandag 11 juni 2007 15:54
Aan: [email protected]
Onderwerp: Re: [magick-users] extracting text area from image


Hi, Ron!

I need (1), i.e extract an image which is the same size as the text
area. I will use a dedicated tool for OCR (2), which is the next step.

Basically this must be a primitive implementation of layout analysis. =)

And the image is here (I thought it'll make it attached):
http://picasaweb.google.com/capsunel/Imaging/photo#5074517723571163346

Note: the red area is not really important.

Alex

PS: I've posted the same question here:
http://www.imagemagick.org/discourse-server/viewtopic.php?f=1&t=8949


On 6/10/07, Ron Savage <[EMAIL PROTECTED]> wrote:
Alexandru Ciobanu wrote:

Hi Alexandru

I am trying to use ImageMagick to extract strictly
the text area from a photograph of a book page.
Do you mean

(1) extract an image which is the same size as the text area, or
(2) extract the text letter-by-letter

The latter is called Optical Character Recognition, and I do not know of any such feature within IM.

If you look at the image attached, I am interested in the green area

and, if possible, the red area.
No image attached. Please upload to your web site.

The problem is that it has to be automated and work
for books of various sizes.
Sure.

My idea so far, is:
apply a really crazy filter that would transform the
green area into o big uniform blob, so that I can
then extract its coordinates, and then use those
on the original image.
Sounds reasonable, but also sounds like (1) above.
--
Ron Savage
[EMAIL PROTECTED]
http://savage.net.au/ _______________________________________________
Magick-users mailing list
[email protected]
http://studio.imagemagick.org/mailman/listinfo/magick-users

_______________________________________________
Magick-users mailing list
[email protected]
http://studio.imagemagick.org/mailman/listinfo/magick-users



----

This message is confidential and may be privileged. Any review, retransmission, 
dissemination or other use of, or taking any action with reference to this 
information by persons other than the intended recipient is prohibited. If you 
received this message in error, please notify the sender by reply e-mail and 
delete this message from all computers. Please note that e-mails are 
susceptible to change. The sender will not accept liability for the improper or 
incomplete transmission of the information contained in this message.


_______________________________________________
Magick-users mailing list
[email protected]
http://studio.imagemagick.org/mailman/listinfo/magick-users


_______________________________________________
Magick-users mailing list
[email protected]
http://studio.imagemagick.org/mailman/listinfo/magick-users

Reply via email to