Re: [DotNetDevelopment] Re: PDF To Data View...

Aman Sharma Sun, 22 Aug 2010 07:58:34 -0700

Hi,

*Solution :-*


I askd this problem long before.....

Here I give u steps I followed and failed...

1. I downloaded "PDFBox" DLLs and exposed all the necessary methods. I
failed because it was giving some security exceptions. I configured trust
level to full and made some settings like AllowPartialTrustedCallers etc.
But all in vain. Exceptions still remained.

2. I tried "iTextSharp" DLLs. I got rid of security exceptions but cropped
up with another error "Resource or File not found". I chkd the path as well
but still.... System.IO exceptions..... :(

3. I tried "PDFToText.exe" . Following code was used for this :-

=======================================

string strFile = Server.MapPath(FileUpload1.PostedFile.FileName);

System.Diagnostics.Process p = new System.Diagnostics.Process();

p.StartInfo.Arguments = " -raw -htmlmeta" + " " + strFile + " " +
"E:\\PDFCont\\output.htm"; ;

p.StartInfo.FileName = Page.MapPath("pdftotext.exe");

p.StartInfo.UseShellExecute = false;

p.StartInfo.CreateNoWindow = false;

p.StartInfo.RedirectStandardOutput = false;

p.Start();

p.WaitForExit();

System.Threading.Thread.Sleep(3000);
===============================

Still stuck with System.IO exceptions.


*Working Code :-*
Finally I tried "ASPPDFLib" DLLs. It worked for me like anything..... :) . I
had to extract text from the PDF file and to generate XML. Here you are...
with working solution..... :)

===========================================

using ASPPDFLib;

protected void btnRead_Click(object sender, EventArgs e)

{

string strExtractTxt = getPDFText(@"E:\PDFCont\Tabular.pdf");

StringReader sr = new System.IO.StringReader(GenerateXML(strExtractTxt));

DataSet dsInfo = new DataSet();

dsInfo.ReadXml(sr);

}

public string getPDFText(string pstrFilePath)

{

IPdfManager objPdf = new PdfManager();

// Open a PDF file for text extraction

IPdfDocument objDoc = objPdf.OpenDocument(pstrFilePath, Missing.Value);

String strText = "";

foreach (IPdfPage objPage in objDoc.Pages)

{

strText += objPage.ExtractText(Missing.Value);

}

strText = Server.HtmlEncode(strText);

return strText.Trim();

}

public string GenerateXML(string pstrExtractTxt)

{

string[] strSplitArr = pstrExtractTxt.Split(' ');

ArrayList strSplitArrCopy = new ArrayList();

string strXML = "<Information>";

for (int x = 0; x < strSplitArr.Length; x++)

{

if (strSplitArr[x] != string.Empty) { strSplitArrCopy.Add(strSplitArr[x]); }

}

int i = 0;

for (int x = 3; x < strSplitArrCopy.Count; x++)

{

if (i == 0) strXML += "<Person>";

strXML += "<" + strSplitArrCopy[i] + ">" + strSplitArrCopy[x] + "</" +
strSplitArrCopy[i] + ">";

if (i == 2) { strXML += "</Person>"; i = 0; }

else i += 1;

}

strXML += "</Information>";

return strXML;

}

===============================================

Hope this solution may help those guys who are looking for PDF text
extraction.


On Wed, Aug 18, 2010 at 2:15 AM, Aman Sharma <[email protected]>wrote:

> I used iTextSharp as well... but got
> "* *System.IO.IOException: C:\inetpub\wwwroot\ReadPDF\Tabular.pdf not
> found as file or resource."
>
> Exception although pdf file was on the correct place....
>
> Here is my code which I used with iTextSharp...
>
>
> string
> strFile = Server.MapPath("Tabular.pdf");
>
> PdfReader
> pdfReader = new PdfReader(strFile );
>
> On PdfReader line I m getting above exception.....
> Pls let me know if u hav any idea
>
>
> On Tue, Aug 17, 2010 at 9:55 AM, alandgri <[email protected]> wrote:
>
>> iTextSharp can read from PDF files, if I recall correctly.
>>
>> On Aug 16, 12:34 am, Aman Sharma <[email protected]> wrote:
>> > Check out this site...
>> >
>> > http://www.pdf-technologies.com/demos/pdfTextExtract.aspx
>> >
>> > They have done exact thing....
>> >
>> > Use this tabular pdf for example :-
>> >
>> > *Name*
>> >
>> > *Age*
>> >
>> > *Sex*
>> >
>> > A
>> >
>> > 20
>> >
>> > M
>> >
>> > B
>> >
>> > 18
>> >
>> > F
>> >
>> > C
>> >
>> > 15
>> >
>> > M
>> >
>> > D
>> >
>> > 1
>> >
>> > F
>> >
>> > I tried PDFBox which gave me security exception....
>> >
>> > Now trying TallPDF.NET and PDFKit.NET.... lets c wheather it will be
>> doable
>> > for me... :)
>> >
>> > If anyone has any idea in PDF(Tabular Format) to text.. pls guide me.
>> >
>> > On Sun, Aug 15, 2010 at 8:35 PM, Aman Sharma <[email protected]
>> >wrote:
>> >
>> > > Data is in pdf and in a tabular form.'
>> >
>> > > I tried to use "PDFBox" open source but getting following error...
>> >
>> > > *Security Exception*
>> > > *Description: *The application attempted to perform an operation not
>> > > allowed by the security policy.  To grant this application the
>> required
>> > > permission please contact your system administrator or change the
>> > > application's trust level in the configuration file.
>> >
>> > > *Exception Details: *System.Security.SecurityException: That assembly
>> does
>> > > not allow partially trusted callers.
>> >
>> > > Dont know what to do....
>> >
>> > > On Sun, Aug 15, 2010 at 5:59 PM, Stephen Russell <
>> [email protected]>wrote:
>> >
>> > >> On Sun, Aug 15, 2010 at 3:40 AM, Aman Sharma <
>> [email protected]>
>>  > >> wrote:
>> > >>  > Hi,
>> >
>> > >> > Is it possible in ASP.Net to read data from pdf file and show that
>> into
>> > >> Data
>> > >> > View....?
>> > >> -----------------------------
>> >
>> > >> Is the data from an excel sheet  or a report?
>> >
>> > >> There are different PDF tools for VS and some can manipulate the raw
>> > >> data where others just make the content.
>> >
>> > >> --
>> > >> Stephen Russell
>> >
>> > >> Sr. Production Systems Programmer
>> > >> CIMSgts
>> >
>> > >> 901.246-0159 cell
>>
>
>

Re: [DotNetDevelopment] Re: PDF To Data View...

Reply via email to