Hi

I wonder if You could help me to set up a script that gathers a few lines 
of text from a bunch of HTML documents and then places it in an excel 
spreadsheet. We lost a site and all we have is a backed up local copy that 
unfortunately does not have the site database, just the actual generated 
HTML pages. We need to get the data into a database again.

I have a folder with subfolders. In the subfolders are HTML documents. In 
these documents, there are a few strings of text I need to gather and place 
in a spreadsheet. Each HTML document is for one product/magazine.

The text I am looking for in each of these is in the HTML code. I need the 
magazine title and the cover price which is found in this text string:

*   <h1 class="inner_black_txt">4-Wheel ATV Action Magazine</h1>*
*                    <div class="padding1"></div>*
*                    <div class="car_top_txt_left gray_txt4">Cover 
Price:</div><div class="fleft gray_txt1"><strike><span 
id="rrpSpan4246">$59.88</span></strike></div>*
*                    <div class="clear"></div>*

Then I need the product description, a text that always starts with "About" 
and it is found here;

* <div class="automaticly_rel">*
*                     <h2 class="car_absol_title3">About 4-Wheel ATV Action 
Magazine</h2>*
*                       <div class="padding1"></div>*
*                       <div class="white_corner_tl"></div><div 
class="white_corner_t" style="width:773px;"></div><div 
class="white_corner_tr"></div>*
*                        <div class="white_corner_m" style="width:783px;">*
*                         <div class="white_corner_m_inner4">*
*                          <p class="gray_txt" 
style="line-height:18px;">4-Wheel ATV Action is the ultimate 
high-performance all terrain vehicle magazine. Each issue gives you full 
racing coverage, results and schedules, ATV performance tests and shootout 
competitions, product reviews, and interviews with leading ATV 
personalities. This magazine subscription brings you all the best action 
with full-color photos and in-depth articles.</p>*
*                          <div class="clear"></div>*

Then (if possible) I need the related magazine that are listed in a section 
below which looks like this:

 *<div class="mid_bg2">*
*                    <div class="popular_magzine_mid_inner">*
*                                                                   <div 
class="popluar_cart_box">*
*                             <div class="magz_thumb_bg"><a 
href="../5498/dirt-rider.html"><img 
src="../../shopimages/products/thumbnails/extra/dirtriderjuly2013.jpg" 
border="0" width="91" alt="Dirt Rider" 
class="magz_thumb_padding"></a></div>*
*                             <div class="clear"></div>*
*                             <div align="center" 
class="popular_blue_txt"><a href="../5498/dirt-rider.html">Dirt 
Rider</a></div>*
*                             <div class="clear"></div>*
*                             <div class="popular_gray_txt2" 
align="center">Normal Price: <strike>$59.88/year</strike></div>*
*                             <div class="clear"></div>*
*                             <div align="center"><span 
class="popular_gray_txt3">Our Price:</span> <span 
class="popular_red_txt2">$11.99</span></div>*
*                             <div class="clear"></div>*
*                             <div align="center"><span 
class="popular_gray_txt2">You save:</span> <span 
class="popular_black_txt1">$47.89 (79.98%)</span></div>*
*                             <div class="clear"></div>*
*                             <div align="center"><a 
href="../5498/dirt-rider.html"><img src="../../images/leran_more_butt.png" 
border="0" alt=""></a></div>*
*                           </div>*
*                           <div class="popluar_cart_box_gap">&nbsp;</div>*
*                                                                  <div 
class="popluar_cart_box">*
*                             <div class="magz_thumb_bg"><a 
href="../5767/motocross-action.html"><img 
src="../../shopimages/products/thumbnails/extra/Motocross-Action-Magazine.jpg" 
border="0" width="91" alt="Motocross Action" 
class="magz_thumb_padding"></a></div>*
*                             <div class="clear"></div>*
*                             <div align="center" 
class="popular_blue_txt"><a href="../5767/motocross-action.html">Motocross 
Action</a></div>*
*                             <div class="clear"></div>*
*                             <div class="popular_gray_txt2" 
align="center">Normal Price: <strike>$59.88/year</strike></div>*
*                             <div class="clear"></div>*
*                             <div align="center"><span 
class="popular_gray_txt3">Our Price:</span> <span 
class="popular_red_txt2">$19.99</span></div>*
*                             <div class="clear"></div>*
*                             <div align="center"><span 
class="popular_gray_txt2">You save:</span> <span 
class="popular_black_txt1">$39.89 (66.62%)</span></div>*
*                             <div class="clear"></div>*
*                             <div align="center"><a 
href="../5767/motocross-action.html"><img 
src="../../images/leran_more_butt.png" border="0" alt=""></a></div>*
*                           </div>*
*                           <div class="popluar_cart_box_gap">&nbsp;</div>*
*                                                                  <div 
class="popluar_cart_box">*
*                             <div class="magz_thumb_bg"><a 
href="../7806/4-wheel---off-road.html"><img 
src="../../shopimages/products/thumbnails/extra/4wheeljuly2013.jpg" 
border="0" width="91" alt="4 Wheel &amp; Off Road" 
class="magz_thumb_padding"></a></div>*
*                             <div class="clear"></div>*
*                             <div align="center" 
class="popular_blue_txt"><a href="../7806/4-wheel---off-road.html">4 Wheel 
&amp; Off Road</a></div>*
*                             <div class="clear"></div>*
*                             <div class="popular_gray_txt2" 
align="center">Normal Price: <strike>$47.88/year</strike></div>*
*                             <div class="clear"></div>*
*                             <div align="center"><span 
class="popular_gray_txt3">Our Price:</span> <span 
class="popular_red_txt2">$11.99</span></div>*
*                             <div class="clear"></div>*
*                             <div align="center"><span 
class="popular_gray_txt2">You save:</span> <span 
class="popular_black_txt1">$35.89 (74.96%)</span></div>*
*                             <div class="clear"></div>*
*                             <div align="center"><a 
href="../7806/4-wheel---off-road.html"><img 
src="../../images/leran_more_butt.png" border="0" alt=""></a></div>*
*                           </div>*
*                           <div class="popluar_cart_box_gap">&nbsp;</div>*
*                                                                  <div 
class="popluar_cart_box">*
*                             <div class="magz_thumb_bg"><a 
href="../4576/dirt-wheels.html"><img 
src="../../shopimages/products/thumbnails/extra/Dirt-Wheels.jpg" border="0" 
width="91" alt="Dirt Wheels" class="magz_thumb_padding"></a></div>*
*                             <div class="clear"></div>*
*                             <div align="center" 
class="popular_blue_txt"><a href="../4576/dirt-wheels.html">Dirt 
Wheels</a></div>*
*                             <div class="clear"></div>*
*                             <div class="popular_gray_txt2" 
align="center">Normal Price: <strike>$66.00/year</strike></div>*
*                             <div class="clear"></div>*
*                             <div align="center"><span 
class="popular_gray_txt3">Our Price:</span> <span 
class="popular_red_txt2">$19.99</span></div>*
*                             <div class="clear"></div>*
*                             <div align="center"><span 
class="popular_gray_txt2">You save:</span> <span 
class="popular_black_txt1">$46.01 (69.71%)</span></div>*
*                             <div class="clear"></div>*
*                             <div align="center"><a 
href="../4576/dirt-wheels.html"><img src="../../images/leran_more_butt.png" 
border="0" alt=""></a></div>*
*                           </div>*
*                           <div class="popluar_cart_box_gap">&nbsp;</div>*
                                                                            
                                                                

*                    </div>*
*                   </div>*
*                   <div class="mid_bot_bg"></div>*


The related magazines may not be listed and can be any number of magazines 
(0-20 different titles).

*I know it must be possible to gather this data from the many HTML files I 
have (product pages) but I cannot figure out how to script it*. I also need 
all data to be listed in an Excel sheet or CSV file where the structure 
should be:

*Magazine title, Cover Price, Product description, Related Magazines* (up 
to 20 columns).


Is this at all possible and could You maybe point me in the right direction 
or send me a script which is based on the data I have presented here? That 
would be absolutely awesome!

Best Regards
Johan Niklasson

-- 
This is the BBEdit Talk public discussion group. If you have a 
feature request or would like to report a problem, please email
"[email protected]" rather than posting to the group.
Follow @bbedit on Twitter: <http://www.twitter.com/bbedit>

--- 
You received this message because you are subscribed to the Google Groups 
"BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].

Reply via email to