> I am an intermediate perl user. I taught myself Perl by reading "Learning > Perl," with some online tutorials and I have some other reference texts. I > can generally do what I need to with with Perl, but my code is far from > elegant. I understand the very basics of object-oriented programming in > Perl, but I generally need sample code to get started with modules from > cpan. I am a professor at Rice University and have found Perl to be > invaluable for extracting data for my research, especially the regular > expression capabilities of Perl. I have been unable to attend any of the > monthly meetings, but hope to in the future. > > For my current project, I am trying to extract historical financial > statement data from www.marketwatch.com. The url is > http://www.marketwatch.com/tools/quotes/financials.asp?symb=ABSD&sid=0&report=2&freq=0 . > I use WWW::Mechanize to download the webpage and then I use > HTML::TableExtract to extract the text that I need. I want to transpose the > table at depth=1, count=1 after extracting it so that each year is a row and > each variable is a column. I have not been able to find any documentation > on how to extract a column from a table using HTML::TableExtract. > > The following simple program downloads the data using WWW::Mechanize and > extracts the table with HTML::TableExtract and prints the output of each > row. > > #!/usr/bin/perl > > use HTML::TableExtract; > use WWW::Mechanize; > use strict; > > my $marketwatch = WWW::Mechanize->new( autocheck => 1 ); > $marketwatch->get(" http://www.marketwatch.com/tools/quotes/financials.asp?symb=ABSD&sid=0&report=2&freq=0 > "); > > chomp(my $html = $marketwatch->content); > > my $table = HTML::TableExtract->new(keep_html=>0, depth => > 1, count => 1, br_translate => 0 ); > $table->parse($html); > > foreach my $row ($table->rows) { > print join("\t", @$row), "\n"; > } > > I am not able to figure out how to use the columns method. My intuition > makes me think it should be something like the following (but my intuition > is wrong): > > foreach my $column ($table->columns) { > print join("\t", @$column), "\n"; > } > > The error message I get says: Can't locate object method "columns" via > package "HTML::TableExtract". The documentation doesn't shed much light > (for me anyway). I can see in the code of the module that the columns > method belongs to HTML::TableExtract::Table, but I can't figure out how to > use it. > > I appreciate any help. For an experienced programmer, I am sure this is > trivial, but I am the closest thing to a programmer in my department, and I > don't really have anyone around me that I can get help from. > > _______________________________________________ > Houston mailing list > [email protected] > http://mail.pm.org/mailman/listinfo/houston > Website: http://houston.pm.org/ >It looks like you need to call method columns from a HTML::TableExtract::Table object and not a HTML::TableExtract object. >From the docs and your email maybe something like this could get you started: my $table = HTML::TableExtract->new(keep_html=>0, depth => 1, count => 1, br_translate => 0 ); $table->parse($html); my $t = $table->table(1,1); foreach my $row ($t->columns) { print join("\t", @$row), "\n"; }
Thanks. This works perfectly and saved me hours!
_______________________________________________ Houston mailing list [email protected] http://mail.pm.org/mailman/listinfo/houston Website: http://houston.pm.org/
