https://issues.apache.org/bugzilla/show_bug.cgi?id=45575
Summary: [PATCH] Code to know if a Range is in body,
header/footer, footnote etc.
Product: POI
Version: 3.0-dev
Platform: PC
OS/Version: Linux
Status: NEW
Severity: enhancement
Priority: P2
Component: HWPF
AssignedTo: [email protected]
ReportedBy: [EMAIL PROTECTED]
CC: [EMAIL PROTECTED]
Created an attachment (id=22394)
--> (https://issues.apache.org/bugzilla/attachment.cgi?id=22394)
Simple test doc with body, header/footer, annotations, footnotes and endnotes
Using a small trick (based on text length) it's possibile to get the location
of a Range (body? header/footer? footnote? etc.). For example, let's suppose to
have 3 character runs:
1) coded in ASCII, ending at 2000
2) coded in Unicode, ending at 4050
3) coded in ASCII, ending 2100
4) coded in Unicode, ending at 4200
5) coded in Unicode, ending at 4500
and that ccpText field of the document they belong is 2100.
If every chacater run was in ASCII (we can know if a character run is Unicode
or ASCII, comparing length in characters from text and length in bytes from
end-start), the end values would be
1) 2000
2) 2025
3) 2100
4) 2100
5) 2250
and then, comparing *these* end values with ccpText, we can conclude that the
character runs are
1) in body
2) in body
3) at end of body
4) at end of body
5) out of body, maybe in footnote
This same algorithm can be applied to all Range types (paragraph, section, and
so on) and to all locations (body, header/footer, footnote, etc.)
To make it possible, it's necessary to;
1) add to FileInformationBlock class the new lines
public int getCcpFtn() {
return _longHandler.getLong(FIBLongHandler.CCPFTN);
}
public int getCcpHdd() {
return _longHandler.getLong(FIBLongHandler.CCPHDD);
}
public int getCcpAtn() {
return _longHandler.getLong(FIBLongHandler.CCPATN);
}
public int getCcpEdn() {
return _longHandler.getLong(FIBLongHandler.CCPEDN);
}
to know limits in characters of footnotes, header/footer, annotations and
endnotes respectively
2) create a new enum in "usermodel" package to represent locations
public enum Location {
BODY,
FOOTNOTE,
HEADER_FOOTER,
ANNOTATION,
ENDNOTE,
UNKNOWN;
}
Instead of an enum, also a series of int constants defined in Range may be
used.
3) add to Range class the new member variable
protected Location _location = null;
and the new method
public Location getLocationType() {
if(_location == null)
{
//it stores the end in characters
int x = 0;
int charLen = this.text().length();
int byteLen = _end - _start;
if(byteLen == charLen)
x = _end; //ASCII
else
x = _end / 2; //Unicode
FileInformationBlock fib =
_doc.getFileInformationBlock();
if(x <= fib.getCcpText())
_location = Location.BODY;
else if(x <= fib.getCcpText() + fib.getCcpFtn())
_location = Location.FOOTNOTE;
else if(x <= fib.getCcpText() + fib.getCcpFtn() +
fib.getCcpHdd())
_location = Location.HEADER_FOOTER;
else if(x <= fib.getCcpText() + fib.getCcpFtn() +
fib.getCcpHdd() + fib.getCcpAtn())
_location = Location.ANNOTATION;
else if(x <= fib.getCcpText() + fib.getCcpFtn() +
fib.getCcpHdd() + fib.getCcpAtn() + fib.getCcpEdn())
_location = Location.ENDNOTE;
else
_location = Location.UNKNOWN;
}
return _location;
}
This is a simple test class (perhaps it can be transformed in a JUnit
testcase?) to test my code:
public class QuickTest
{
public QuickTest()
{
}
public static void main(String[] args)
{
try
{
JFileChooser jfc = new JFileChooser();
int esito = jfc.showOpenDialog(null);
if(esito != JFileChooser.APPROVE_OPTION)
{
JOptionPane.showMessageDialog(null, "No file
selected");
}
else
{
String percorso =
jfc.getSelectedFile().getAbsolutePath();
HWPFDocument doc = new HWPFDocument(new
FileInputStream(percorso));
Range r = doc.getRange();
for(int i = 0; i < r.numParagraphs(); i++)
{
//Paragraph, CharacterRun, Section... it's
equivalent
Paragraph cr = r.getParagraph(i);
System.out.println("<" + cr.text().trim() + ">
" + cr.getLocationType());
}
}
}
catch(Exception er)
{
er.printStackTrace();
}
}
}
which, applied to test doc I have attached, produces the output
<BODY TEXT FRAGMENT 1> BODY
<BODY TEXT FRAGMENT 2> BODY
<> BODY
<FOOTNOTE TEXT 1> FOOTNOTE
<FOOTNOTE TEXT 2> FOOTNOTE
<> FOOTNOTE
<> HEADER_FOOTER
<> HEADER_FOOTER
<> HEADER_FOOTER
<> HEADER_FOOTER
<> HEADER_FOOTER
<> HEADER_FOOTER
<> HEADER_FOOTER
<> HEADER_FOOTER
<HEADER TEXT FRAGMENT 1> HEADER_FOOTER
<HEADER TEXT FRAGMENT 2> HEADER_FOOTER
<> HEADER_FOOTER
<FOOTER TEXT FRAGMENT 1> HEADER_FOOTER
<FOOTER TEXT FRAGMENT 2> HEADER_FOOTER
<> HEADER_FOOTER
<> HEADER_FOOTER
<ANNOTATION 1> ANNOTATION
<ANNOTATION 2> ANNOTATION
<> ANNOTATION
<ENDNOTE TEXT> ENDNOTE
<> ENDNOTE
<> UNKNOWN
--
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]