From:             work at danemacmillan dot com
Operating system: centos6.4 (64bit), win7 (64bit)
PHP version:      5.4.15
Package:          *XML functions
Bug Type:         Bug
Bug description:DomDocument getAttribute return empty on IMG/src, LINK/href, 
with protocol-rela

Description:
------------
DomDocument's getAttribute will return an empty string on both an IMG tag's
SRC attribute, and a LINK's HREF attribute *when* the URLs provided are
protocol-relative. 

In the description below I'm going to refer to IMG tags only, but the same
issue stands for the LINK tag's HREF attribute as well. There may be
others, but these are the only two I discovered. 

The problem arises when scraping the SRC of IMG tags. If the IMG tag SRC
has a protocol-relative URL (an absolute path beginning with "//" instead
of "http://";), it will be unreadable. Relative paths are readable (e.g.,
"/img/landing.png").

I've used both the getElementsByTagName method and the Xpath method. They
both suffer from the same problem. However, the moment I prepend any
absolute URL with its designated protocol, the IMG SRC is readable. 

This problem does not exist for the A tag HREF attribute, nor the SCRIPT
tag SRC attribute; in each case the URL provided will be returned,
regardless of the URL format.

Test script:
---------------
To summarize, the first two URLs will be readable, and the third URL will
*not* be readable:

<img src="http://www.example.com/img/eg.png"; />
<img src="/img/eg.png" />
<img src="//www.example.com/img/eg.png" />

I'll demonstrate two ways to grab the URLs, and they both fail with
protocol-relative URLs.

// use for both examples:
$html = file_get_contents("http://www.example.com";);

// one way (Xpath)
$dom = new DOMDocument();
@$dom->loadHTML($html);
$x = new DOMXPath($dom); 
$data = array();
foreach($x->query('//img') as $node) {
    $data['img']['src'][] = urldecode($node->getAttribute('src'));
}

// another way (getElementsByTagName)
$doc = new DOMDocument();
@$doc->loadHTML($html);
$imgs = $doc->getElementsByTagName('img');
$data = array();
for ($i = 0; $i < $imgs->length; $i++) {
    $img = $imgs->item($i);
    if($img->getAttribute('src'))
    {
        $data[] = urldecode($img->getAttribute('src'));
    }
} 

// print results from either
print_r($data);

Expected result:
----------------
Array ( [img] => Array ( [src] => Array ( [0] =>
//www.example.com/img/eg.png ) ) )

Actual result:
--------------
Array ( [img] => Array ( [src] => Array ( [0] =>  ) ) )

-- 
Edit bug report at https://bugs.php.net/bug.php?id=64946&edit=1
-- 
Try a snapshot (PHP 5.4):   
https://bugs.php.net/fix.php?id=64946&r=trysnapshot54
Try a snapshot (PHP 5.3):   
https://bugs.php.net/fix.php?id=64946&r=trysnapshot53
Try a snapshot (trunk):     
https://bugs.php.net/fix.php?id=64946&r=trysnapshottrunk
Fixed in SVN:               https://bugs.php.net/fix.php?id=64946&r=fixed
Fixed in release:           https://bugs.php.net/fix.php?id=64946&r=alreadyfixed
Need backtrace:             https://bugs.php.net/fix.php?id=64946&r=needtrace
Need Reproduce Script:      https://bugs.php.net/fix.php?id=64946&r=needscript
Try newer version:          https://bugs.php.net/fix.php?id=64946&r=oldversion
Not developer issue:        https://bugs.php.net/fix.php?id=64946&r=support
Expected behavior:          https://bugs.php.net/fix.php?id=64946&r=notwrong
Not enough info:            
https://bugs.php.net/fix.php?id=64946&r=notenoughinfo
Submitted twice:            
https://bugs.php.net/fix.php?id=64946&r=submittedtwice
register_globals:           https://bugs.php.net/fix.php?id=64946&r=globals
PHP 4 support discontinued: https://bugs.php.net/fix.php?id=64946&r=php4
Daylight Savings:           https://bugs.php.net/fix.php?id=64946&r=dst
IIS Stability:              https://bugs.php.net/fix.php?id=64946&r=isapi
Install GNU Sed:            https://bugs.php.net/fix.php?id=64946&r=gnused
Floating point limitations: https://bugs.php.net/fix.php?id=64946&r=float
No Zend Extensions:         https://bugs.php.net/fix.php?id=64946&r=nozend
MySQL Configuration Error:  https://bugs.php.net/fix.php?id=64946&r=mysqlcfg

Reply via email to