Hi
I wonder if You could help me to set up a script that gathers a few lines
of text from a bunch of HTML documents and then places it in an excel
spreadsheet. We lost a site and all we have is a backed up local copy that
unfortunately does not have the site database, just the actual generated
HTML pages. We need to get the data into a database again.
I have a folder with subfolders. In the subfolders are HTML documents. In
these documents, there are a few strings of text I need to gather and place
in a spreadsheet. Each HTML document is for one product/magazine.
The text I am looking for in each of these is in the HTML code. I need the
magazine title and the cover price which is found in this text string:
* <h1 class="inner_black_txt">4-Wheel ATV Action Magazine</h1>*
* <div class="padding1"></div>*
* <div class="car_top_txt_left gray_txt4">Cover
Price:</div><div class="fleft gray_txt1"><strike><span
id="rrpSpan4246">$59.88</span></strike></div>*
* <div class="clear"></div>*
Then I need the product description, a text that always starts with "About"
and it is found here;
* <div class="automaticly_rel">*
* <h2 class="car_absol_title3">About 4-Wheel ATV Action
Magazine</h2>*
* <div class="padding1"></div>*
* <div class="white_corner_tl"></div><div
class="white_corner_t" style="width:773px;"></div><div
class="white_corner_tr"></div>*
* <div class="white_corner_m" style="width:783px;">*
* <div class="white_corner_m_inner4">*
* <p class="gray_txt"
style="line-height:18px;">4-Wheel ATV Action is the ultimate
high-performance all terrain vehicle magazine. Each issue gives you full
racing coverage, results and schedules, ATV performance tests and shootout
competitions, product reviews, and interviews with leading ATV
personalities. This magazine subscription brings you all the best action
with full-color photos and in-depth articles.</p>*
* <div class="clear"></div>*
Then (if possible) I need the related magazine that are listed in a section
below which looks like this:
*<div class="mid_bg2">*
* <div class="popular_magzine_mid_inner">*
* <div
class="popluar_cart_box">*
* <div class="magz_thumb_bg"><a
href="../5498/dirt-rider.html"><img
src="../../shopimages/products/thumbnails/extra/dirtriderjuly2013.jpg"
border="0" width="91" alt="Dirt Rider"
class="magz_thumb_padding"></a></div>*
* <div class="clear"></div>*
* <div align="center"
class="popular_blue_txt"><a href="../5498/dirt-rider.html">Dirt
Rider</a></div>*
* <div class="clear"></div>*
* <div class="popular_gray_txt2"
align="center">Normal Price: <strike>$59.88/year</strike></div>*
* <div class="clear"></div>*
* <div align="center"><span
class="popular_gray_txt3">Our Price:</span> <span
class="popular_red_txt2">$11.99</span></div>*
* <div class="clear"></div>*
* <div align="center"><span
class="popular_gray_txt2">You save:</span> <span
class="popular_black_txt1">$47.89 (79.98%)</span></div>*
* <div class="clear"></div>*
* <div align="center"><a
href="../5498/dirt-rider.html"><img src="../../images/leran_more_butt.png"
border="0" alt=""></a></div>*
* </div>*
* <div class="popluar_cart_box_gap"> </div>*
* <div
class="popluar_cart_box">*
* <div class="magz_thumb_bg"><a
href="../5767/motocross-action.html"><img
src="../../shopimages/products/thumbnails/extra/Motocross-Action-Magazine.jpg"
border="0" width="91" alt="Motocross Action"
class="magz_thumb_padding"></a></div>*
* <div class="clear"></div>*
* <div align="center"
class="popular_blue_txt"><a href="../5767/motocross-action.html">Motocross
Action</a></div>*
* <div class="clear"></div>*
* <div class="popular_gray_txt2"
align="center">Normal Price: <strike>$59.88/year</strike></div>*
* <div class="clear"></div>*
* <div align="center"><span
class="popular_gray_txt3">Our Price:</span> <span
class="popular_red_txt2">$19.99</span></div>*
* <div class="clear"></div>*
* <div align="center"><span
class="popular_gray_txt2">You save:</span> <span
class="popular_black_txt1">$39.89 (66.62%)</span></div>*
* <div class="clear"></div>*
* <div align="center"><a
href="../5767/motocross-action.html"><img
src="../../images/leran_more_butt.png" border="0" alt=""></a></div>*
* </div>*
* <div class="popluar_cart_box_gap"> </div>*
* <div
class="popluar_cart_box">*
* <div class="magz_thumb_bg"><a
href="../7806/4-wheel---off-road.html"><img
src="../../shopimages/products/thumbnails/extra/4wheeljuly2013.jpg"
border="0" width="91" alt="4 Wheel & Off Road"
class="magz_thumb_padding"></a></div>*
* <div class="clear"></div>*
* <div align="center"
class="popular_blue_txt"><a href="../7806/4-wheel---off-road.html">4 Wheel
& Off Road</a></div>*
* <div class="clear"></div>*
* <div class="popular_gray_txt2"
align="center">Normal Price: <strike>$47.88/year</strike></div>*
* <div class="clear"></div>*
* <div align="center"><span
class="popular_gray_txt3">Our Price:</span> <span
class="popular_red_txt2">$11.99</span></div>*
* <div class="clear"></div>*
* <div align="center"><span
class="popular_gray_txt2">You save:</span> <span
class="popular_black_txt1">$35.89 (74.96%)</span></div>*
* <div class="clear"></div>*
* <div align="center"><a
href="../7806/4-wheel---off-road.html"><img
src="../../images/leran_more_butt.png" border="0" alt=""></a></div>*
* </div>*
* <div class="popluar_cart_box_gap"> </div>*
* <div
class="popluar_cart_box">*
* <div class="magz_thumb_bg"><a
href="../4576/dirt-wheels.html"><img
src="../../shopimages/products/thumbnails/extra/Dirt-Wheels.jpg" border="0"
width="91" alt="Dirt Wheels" class="magz_thumb_padding"></a></div>*
* <div class="clear"></div>*
* <div align="center"
class="popular_blue_txt"><a href="../4576/dirt-wheels.html">Dirt
Wheels</a></div>*
* <div class="clear"></div>*
* <div class="popular_gray_txt2"
align="center">Normal Price: <strike>$66.00/year</strike></div>*
* <div class="clear"></div>*
* <div align="center"><span
class="popular_gray_txt3">Our Price:</span> <span
class="popular_red_txt2">$19.99</span></div>*
* <div class="clear"></div>*
* <div align="center"><span
class="popular_gray_txt2">You save:</span> <span
class="popular_black_txt1">$46.01 (69.71%)</span></div>*
* <div class="clear"></div>*
* <div align="center"><a
href="../4576/dirt-wheels.html"><img src="../../images/leran_more_butt.png"
border="0" alt=""></a></div>*
* </div>*
* <div class="popluar_cart_box_gap"> </div>*
* </div>*
* </div>*
* <div class="mid_bot_bg"></div>*
The related magazines may not be listed and can be any number of magazines
(0-20 different titles).
*I know it must be possible to gather this data from the many HTML files I
have (product pages) but I cannot figure out how to script it*. I also need
all data to be listed in an Excel sheet or CSV file where the structure
should be:
*Magazine title, Cover Price, Product description, Related Magazines* (up
to 20 columns).
Is this at all possible and could You maybe point me in the right direction
or send me a script which is based on the data I have presented here? That
would be absolutely awesome!
Best Regards
Johan Niklasson
--
This is the BBEdit Talk public discussion group. If you have a
feature request or would like to report a problem, please email
"[email protected]" rather than posting to the group.
Follow @bbedit on Twitter: <http://www.twitter.com/bbedit>
---
You received this message because you are subscribed to the Google Groups
"BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].